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@ Device and method for data compression/decompression. 



(57) An apparatus produces an encoded and com- 
pressed digital data stream from an original 
input digital data stream using a forward dis- 
crete wavelet transform and a tree encoding 
method. The input digital data stream may be a 
stream of video image data values in digital 
form. The apparatus is also capable of produc- 
ing a decoded and decompressed digital data 
stream closely resembling the originally input 
digital data stream from an encoded and com- 
pressed digital data stream using a correspond- 
ing tree decoding method and a corresponding 
inverse discrete wavelet transform. A dual con- 
volver is disclosed which performs both bound- 
ary and nonboundary filtering for forward 
transform discrete wavelet processing and 
which also performs filtering of corresponding 
inverse transform discrete wavelet processes. A 
portion of the dual convolver is also usable to 
filter an incoming stream of digital video image 
data values before forward discrete wavelet 
processing. Methods and structures for 
generating the addresses to read/write data 
values from/to memory as well as for reducing 
the total amount of memory necessary to store 
data values are also disclosed. 
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CROSS REFERENCE TO PAPER APPENDICES 

Appendix A which is a part of the present disclosure, is a paper appendix of 6 pages. Appendix A is a de- 
scription of a CONTROL_ENABLE biock contained in the tree processor/encoder-decoder portion of a video 
5 encoder/decoder integrated circuit chip, written in the VHOL hardware description language. 

Appendix B, which is a part of the present disclosure, is a paper appendix of 10 pages. Appendix B is a 
description of a MODE_CONTROL block contained in the tree processor/encoder-decoder portion of a video 
encoder/decoder integrated circuit chip, written in the VHDL hardware description language. 

Appendix C, which is a part of the present disclosure, is a paper appendix of 11 pages. Appendix C is a 
10 description of a CONTROL_COUNTER block contained in the tree processor/encoder-decoder portion of a 
video encoder/decoder integrated circuit chip, written In the VHDL hardware description language. 

A portion of the disclosure of this patent document contains material which is subject to copyright protec- 
tion. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or 
the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise 
is reserves all copyright rights whatsoever. The VHDL hardware description language of Appendices A, B and C 
is an international standard, IEEE Standard 1076-1987, and is described in the "IEEE Standard VHDL Lan- 
guage Reference Manual". 

Appendix D, which is a part of the present disclosure, is a paper appendix of 181 pages. 
Appendix D is a description of one embodiment of a video encoder/decoder integrated circuit chip in the VHDL 
20 hardware description language. The VHDL hardware description language of Appendix D Is an international 
standard, IEEE Standard 1076-1987, and is described in the 'IEEE Standard VHDL Language Reference Man- 
ual". The "IEEE Standard VHDL Language Reference Manual - can be obtained from the Institute of Electrical 
and Electronics Engineers, Inc., 445 Hoese Lane, Piscataway, New Jersey 08855, telephone 1-800-678-4333. 

25 DESCRIPTION 

This invention relates to a method and apparatus for compressing, decompressing, transmitting, and/or 
storing digitally encoded data. In particular, this invention relates to the compression and decompression of 
digital video image data. 

30 An apparatus produces an encoded/compressed digital data stream from an original Input digital data 
stream using a discrete wavelet transform and a tree encoding method. The apparatus is also capable of pro- 
ducing a decoded/decompressed digital data stream closely resembling the originally input digital data stream 
from an encoded/compressed digital data stream using a corresponding tree decoding method and a corre- 
sponding inverse discrete wavelet transform. 
35 The apparatus comprises a discrete wavelet transform circuit which is capable of being configured to per- 

form either a discrete wavelet transform or a corresponding Inverse discrete wavelet transform. The discrete 
wavelet transform circuit comprises an address generator which generates the appropriate addresses to ac- 
cess data values stored in memory. Methods and structures for reducing the total amount of memory neces- 
sary to store data values and for taking advantage of various types of memory devices including dynamic ran- 
40 dom access memory (DRAM) devices are disclosed. A convolver circuit of the discrete wavelet transform circuit 
performs both boundary and non-boundary filtering for the forward discrete wavelet transform and performs 
start, odd, even and end reconstruction filtering for the inverse discrete wavelet transform. The convolver may 
serve the dual functions of 1 ) reducing the number of image data values before subsequent forward discrete 
wavelet transforming, and 2) operating on the reduced number of image data values to perform the forward 
45 discrete wavelet transform. 

The apparatus also comprises a tree processor/ encoder-decoder circuit which is configurable in an en- 
coder mode or in a decoder mode. In the encoder mode, the tree processor/encoder-decoder circuit generates 
addresses to traverse trees of data values of a sub-band decomposition, generates tokens, and quantizes and 
Huffman encodes selected transformed data values stored in memory. In the decoder mode, the tree proces- 
so sor/decoder-encoder circuit receives Huffman encoded data values and tokens, Huffman decodes and inverse 
quantizes the encoded data values, recreates trees of transformed data values from the tokens and data val- 
ues, and stores the recreated trees of data values in memory. 

The apparatus is useful in, but not limited to, the fields of video data storage, video data transmission, 
television, video telephony, computer networking, and other fields of digital electronics in which efficient stor- 
es age and/or transmission and/or retrieval of digitally encoded data is needed. The apparatus facilitates the ef- 
ficient and inexpensive compression and storage of video and/or audio on compact laser discs (commonly 
known as CDs) as well as the efficient and inexpensive storage of video and/or audio on digital video tapes 
(commonly known as VCR or "video cassette recorder" tapes). Similarly, the invention facilitates the efficient 
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and inexpensive retrieval and decompression of video and/or audio from digital data storage media including 
CDs and VCR tapes. 

The invention is further described below, by way of example, with reference to the accompanying drawings, 
in which: 

5 Figure 1 is a block diagram of an expansion printed circuit board which is insertable into a card slot of a 

personal computer. 

Figure 2 is a block diagram of an embodiment of the analog/digital video decoder chip depicted in Figure 

1. 

Figures 3A-C illustrate a 4:1 : 1 luminance-chrominance-chrominance format (Y:U;V) used by the expansion 
w board of Figure 1. 

Figure 4 is an illustration of a timeline of the output values output from the analog/digital video decoder 
chip of Figures 1 and 2. 

Figure 5 is a block diagram of the discrete wavelet transform circuit of the video encoder/decoder chip of 
Figure 1. 

15 Figure 6 is a block diagram of the row convolver block of Figure 5. 

Figure 7 is a block diagram of the column convolver block of Figure 5. 

Figure 8 is a block diagram of the wavelet transform multiplier circuit blocks of Figures 6 and 7. 

Figure 9 is a block diagram of the row wavelet transform circuit block of Figure 6. 

Figure 10 is a diagram illustrating control signals which control the row convolver of Figure 5 and signals 
20 output by the row convolver of Figure 5 during a forward octave 0 transform. 

Figure 11 is a diagram showing data flow in the row convolver of Figure 5 during a forward octave 0 trans- 
form. 

Figure 12 is a diagram illustrating data values output by the row convolver of Figure 5 during the forward 
octave 0 transform. 

25 Figure 1 3 is a block diagram of the column wavelet transform circuit block of Figure 7. 

Figure 14 is a diagram illustrating control signals which control the column convolver of Figure 5 and sig- 
nals output by the column convolver of Figure 5 during a forward octave 0 transform. 

Figure 15 is a diagram showing data flow In the column convolver of Figure 5 during a forward octave 0 
transform. 

30 Figure 16 is a diagram illustrating data values present in memory unit 116 of Figure 1 after operation of 

the column convolver of Figure 5 during the forward octave 0 transform. 

Figure 17 is a diagram showing control signals controlling the row convolver of Figure 5 and signals output 
by the row convolver of Figure 5 during a forward octave 1 transform. 

Figure 18 is a diagram showing data flow in the row convolver of Figure 5 during a forward octave 1 trans- 
35 form. 

Figure 19 is a diagram showing control signals controlling the column convolver of Figure 5 and signals 
output by the column convolver of Figure 5 during a forward octave 1 transform. 

Figure 20 is a diagram showing data flow in the column convolver of Figure 5 during a forward octave 1 
transform. 

40 Figure 21 is a block diagram of one embodiment of the control block 506 of the discrete wavelet transform 

circuit of Figure 5, 

Figure 22 is a diagram showing control signals controlling the column convolver of Figure 5 and signals 
output by the column convolver of Figure 5 during an inverse octave 1 transform. 

Figure 23 Is a diagram showing data flow In the column convolver of Figure 5 during a forward octave 1 
45 transform. 

Figure 24 is a diagram showing control signals controlling the row convolver of Figure 5 and signals output 
by the row convolver of Figure 5 during an inverse octave 1 transform. 

Figure 25 is a diagram showing data flow in the row convolver of Figure 5 during an inverse octave 1 trans- 
form. 

so Figure 26 is a diagram showing control signals controlling the column convolver of Figure 5 and signals 

output by the column convolver of Figure 5 during an inverse octave 0 transform. 

Figure 27 is a diagram showing data flow in the column convolver of Figure 5 during an inverse octave 0 
transform. 

Figure 28 is a diagram showing control signals controlling the row convolver of Figure 5 and signals output 
55 by the row convolver of Figure 5 during an inverse octave 0 transform. 

Figure 29 is a diagram showing data flow in the row convolver of Figure 5 during an inverse octave 0 trans- 
form. 

Figure 30 is a block diagram of the DWT address generator block of the discrete wavelet transform circuit 
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of Figure 5. 

Figure 31 is a block diagram of the tree processor/encoder-decoder circuit 124 of Figure 1, simplified to 
i llustrate an encoder mode. 

Figure 32 is a block diagram of the tree processor/encoder-decoder circuit 124 of Figure 1, simplified to 
5 illustrate a decoder mode. 

Figure 33 is a block diagram of the decide circuit block 3112 of the tree processor/encoder-decoder of Fig- 
ures 31-32. 

Figure 34 is a block diagram of the tree processor address generator TP_ADDR_GEN block 3114 of the 
tree processor/encoder-decoder of Figures 31-32. 
w Figure 35 illustrates the state table for the CONTROL.ENABLE block 3420 of the tree processor address 

generator of Figure 34. 

Figure 36 is a graphical illustration of the tree decomposition process, illustrating the states and corre- 
sponding octaves of Figure 35. 

Figure 37 is a block diagram of the quantizer circuit block 3116 of the tree processor/encoder-decoder of 
15 Figures 31-32. 

Figure 38 is a block diagram of the buffer block 3122 of the tree processor/encoder-decoder of Figures 
31-32. 

Figure 39 is a diagram of the buffer block 3122 of Figure 38 which has been simplified to illustrate buffer 
block 3122 operation in the encoder mode. 
20 Figure 40 illustrates the output of barrel shifter 3912 of buffer block 31 22 when buffer block 3122 is in the 

encoder mode as in Figure 39. 

Figure 41 is a diagram of the buffer block 3122 of Figure 38 which has been simplified to illustrate buffer 
block 3122 operation in the decoder mode. 

Figure 42 illustrates a pipelined encoding-decoding scheme used by the tree processor/encoder-decoder 
25 1 24 of Figures 31 and 32. 

Figure 43 is a block diagram of another embodiment in accordance with the present invention in which the 
Y:U:V input is in a 4:2:2 format. 

Figure 44 illustrates a sequence in which luminance data values are read from and written to the new por- 
tion of memory unit 116 of the PC board 100 in a first embodiment in accordance with the invention in which 
30 memory unit 116 is realized as a static random access memory (SRAM). 

Figure 45 illustrates a sequence in which luminance data values are read from and written to the new por- 
tion of memory unit 116 of the PC board 100 in a second embodiment in accordance with the present invention 
in which memory unit 116 is realized as a dynamic random access memory (DRAM). 

Figure 46 illustrates a third embodiment in accordance with the present invention in which memory unit 
35 1 16 of the PC board 100 is realized as a dynamic random access memory and in which a series of static random 
access memories are used as cache buffers between tree processor/encoder-decoder 124 and memory unit 
116. 

Figure 47 illustrates a time line of the sequence of operations of the circuit illustrated in Figure 46. 

Figure 1 illustrates a printed circuit expansion board 100 which is insertabie into a card slot of a personal 
40 computer. Printed circuit board 100 may be used to demonstrate features in accordance with various aspects 
of the present invention. Printed circuit board 100 receives an analog video signal 101 from an external video 
source 104 (such as a CD player), converts information in the analog video signal into data in digital form, trans- 
forms and compresses the data, and outputs compressed data onto a computer data bus 106 (such as an 
ISA/NUBUS parallel bus of an IBM PC or IBM PC compatible personal computer). While performing this com- 
45 pression function, the board 100 can also output a video signal which is retrievable from the compressed data. 
This video signal can be displayed on an external monitor 108. This allows the user to check visually the quality 
of images which will be retrievable later from the compressed data while the compressed data is being gen- 
erated. Board 100canalso read previously compressed video data from data bus106of the personal computer, 
decompress and inverse-transform that data into an analog video signal, and output this analog video signal 
so to the external monitor 108 for display. 

Board 100 comprises an analog-to-digital video decoder 110, a video encoder/decoder integrated circuit 
chip 112, two static random access memory (SRAM) memory units 114 and 116, a display driver 118, and a 
first-in-first-out memory 1 20. Analog- to-digital (A/D) video decoder 1 1 0 converts incoming analog video signal 
101 into a digital format Video encoder/decoder chip 112 receives the video signal in the digital format and 
55 performs a discrete wavelet transform (DWT) function, and then a tree processing function, and then a Huff- 
man encoding function to produce a corresponding compressed digital data stream. Memory unit 116 stores 
"new" and "old" DWT-transformed video frames. 

Video encoder/decoder chip 112 comprises a discrete wavelet transform circuit 122 and a tree proces- 
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sor/ encoder-decoder circuit 124. The discrete wavelet transform circuit 122 performs either a forward discrete 
wavelet transformation or an inverse discrete wavelet transformation, depending on whether the chip 112 is 
configured to compress video data or to decompress compressed video data. Similarly, the tree processor/en- 
coder-decoder circuit 124 either encodes wavelet-transformed images into a compressed data stream or de- 

5 codes a compressed data stream Into decompressed images in wavelet transform form, depending on whether 
the chip 112 is configured to compress or to decompress video data. Video encoder/decoder chip 112 is also 
coupled to computer bus 106 via a download register bus 128 so that the discrete wavelet transform circuit 
122 and the tree processor/encoder-decoder circuit 124 can receive control values (such as a value indicative 
of image size) from ISA bus 106. The control values are used to control the transformation, tree processing, 

10 and encoding/decoding operations. FIFO buffer 120 buffers data flow between the video encoder/decoder chip 
112 and the data bus 106. Memory unit 114 stores a video frame in uncompressed digital video format. Display 
driver chip 118 converts digital video data from either decoder 110 or from memory unit 114 into an analog 
video signal which can be displayed on external monitor 108. 

Figure 2 is a block diagram of analog/digital video decoder 110. Analog/digital video decoder 110 converts 

is the analog video input signal 101 into one 8-bit digital image data output signal 202 and two digital video SYNC 
output signals 201. The 8-bit digital image output signal 202 contains the pixel luminance values, Y, time mul- 
tiplexed with the pixel chrominance values, U and V. The video SYNC output signals 201 comprise a horizontal 
synchronization signal and a vertical synchronization signal. 

Figures 3A-C illustrate a 4:1:1 luminance-chrominance-chrominance format (Y:U:V) used by board 100. 

20 Because the human eye is less sensitive to chrominance variations than to luminance variations, chrominance 
values are subsampled such that each pixel shares an 8-bit chrominance value U and an 8-bit chrominance 
value V with three of its neighboring pixels. The four pixels in the upper-left hand corner of the image, for ex- 
ample, are represented by {Yoo. Uoo, Voo}, {Yoi, Uoo, Voo}, {Y 10 , Uoo, Voo}, and {Y 11t Uoo, Voo}. The next four pixels 
to the right are represented by {Y^, U 01f V 01 , {Ya,; U 0 i, V 01 }, {Y 12 , U 0 i, V 01 }, and {Y 13 , U 0 i, V 01 ). A/D video decoder 

25 110 serially outputs all the 8-bit Y-luminance values of a frame, followed by all the 8-bit U-chrominance values 
of the frame, followed by all the 8-bit V-chrominance values of the frame. The Y, U and V values for a frame 
are output every 1/30 of a second. A/O video decoder 110 outputs values in raster-scan format so that a row 
of pixel values Yoo, Y 01l Y^ ... is output followed by a second row of pixel values Y 10 , Y 1lP Y 12 ... and so forth 
until all the values of the frame of Figure 3A are output The values of Figure 3B are then output row by row 

30 and then the values of Figure 3C are output row by row. In this 4:1:1 format, each of the U and V components 
of the image contains one quarter of the number of data values contained in the Y component. 

Figure 4 is a diagram of a timeline of the output of A/D video decoder 110. The bit rate of the decoder output 
is equal to 30 frames/sec x 1 2 bits/pixel. For a 640 x 400 pixel image, for example, the data rate is approximately 
110 x 10 8 bits/second. A/D video decoder 110 also detects the horizontal and vertical synchronization signals 

35 in the incoming analog video input signal 102 and produces corresponding digital video SYNC output signals 
201 to the video encoder/decoder chip 112. 

The video encoder/decoder integrated circuit chip 112 has two modes of operation. It can either transform 
and compress ("encode") a video data stream into a compressed data stream or it can inverse transform and 
decompress ("decode") a compressed data stream into a video data stream. In the compression mode, the 

40 digital image data 202 and the synchronization signals 201 are passed from the A/D video decoder 110 to the 
discrete wavelet transform circuit 122 inside the video encoder/decoder chip 112. The discrete wavelet trans- 
form circuit 122 performs a forward discrete wavelet transform operation on the image data and stores the 
resulting wavelet-transformed image data in the "new" portion of memory unit 116. At various times during this 
forward transform operation, the "new" portion of memory unit 116 stores intermediate wavelet transform re- 

45 suits, such that certain of the memory locations of memory unit 116 are read and overwritten a number of times. 
The number of times the memory locations are overwritten corresponds to the number of octaves in the wavelet 
transform. After the image data has been converted into a sub-band decomposition of wavelet-transformed 
image data, the tree processor/ encoder-decoder circuit 124 of encoder/decoder chip 112 reads wavelet-trans- 
formed image data of the sub-band decomposition from the "new** portion of memory 116, processes it, and 

so outputs onto lines 130 a compressed ("encoded") digital data stream to FIFO buffer 120. During this tree proc- 
essing and encoding operation, the tree processor/en coder- decoder circuit 124 also generates a quantized 
version of the encoded first frame and stores that quantized version in the "old" portion of memory unit 116. 
The quantized version of the encoded first frame is used as a reference when a second frame of wavelet-trans- 
formed image data from the "new" portion of memory unit 116 is subsequently encoded and output to bus 106. 

55 While the second frame is encoded and output to bus 106, a quantized version of the encoded second frame 
is written to the "old" portion of memory unit 116. Similarly, the quantized version of the encoded second frame 
in the "old" portion of memory unit 116 is later used as a reference for encoding a third frame of image data. 
In the decompression mode, compressed ("encoded") data is written into FIF0 120 from data bus 106 and 
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is read from FIFO 120 into tree processor/encoder-dscoder circuit 124 of the video encoder/decoder chip 112. 
The tree processor/encoder-decoder circuit 124 decodes the compressed data into decompressed wavelet- 
transformed image data and then stores the decompressed wavelet-transformed image data into the "old" por- 
tion of memory unit 116. During this operation, the "new" portion of memory unit 116 is not used. Rather, the 

5 tree processor/encoder-decoder circuit 124 reads the previous frame stored in the "old" portion of memory 
unit 116 and modifies it with information from the data stream received from FIFO 120 in order to generate 
the nextf rame. The next frame is written over the previous frame in the same "old" portion of the memory unit 
116. Once the decoded wavelet-transformed data of a frame of image data is present in the "old" portion of 
memory unit 1 1 6, the discrete wavelet transform circuit 122 accesses memory unit 116 and performs an inverse 

10 discrete wavelet transform operation on the frame of image data. For each successive octave of the inverse 
transform, certain of the memory locations in the "old" portion of memory unit 116 are read and overwritten. 
The number of times the locations are overwritten corresponds to the number of octaves in the wavelet trans- 
form. On the final octave of the inverse transform which converts the image data from octave-0 transform do- 
main into standard image domain, the discrete wavelet transform circuit 122 writes the resulting decompressed 

is and inverse-transformed image data into memory unit 114. The decompressed and inverse-transformed image 
data may also be output to the video display driver 116 and displayed on monitor 108. 

Figure 5 is a block diagram of the discrete wavelet transform circuit 122 of video encoder/decoder chip 
112. The discrete wavelet transform circuit 122 shown enclosed by a dashed line comprises a row convolver 
block CONV_ROW 502, a column convolver block CONV_COL 504, a control block 506, a DWT address gen- 

20 erator block 508, a REGISTERS block 536, and three multiplexers, mux1 510, mux2 512, and mux3 514. In 
order to transform a frame of digital video image data received from A/D video decoder 110 into the wavelet 
transform domain, a forward two dimensional discrete wavelet transform is performed. Similarly, in order to 
return the wavelet transform digital data values of the frame into a digital video output suitable for displaying 
on a monitor such as 1 08, an inverse two dimensional discrete wavelet transform is performed. In the presently 

25 described embodiment of the present invention, four coefficient quasi-Daubechies digital filters are used as 
set forth in the copending Patent Cooperation Treaty (PCT) application filed March 30, 1994 entitled "Data 
Compression and Decompression". 

The discrete wavelet transform circuit 1 22 shown in Figure 5 performs a forward discrete wavelet transform 
as follows. First, a stream of 8-bit digital video image data values is supplied, one value at a time, to the discrete 

30 wavelet transform circuit 1 22 via eight leads 51 6. The digital video image data values are coupled through mul- 
tiplexer muxl 510 to the input leads 518 of the row convolver CONV_ROW block 502. The output leads 520 
of CONV_ROW block 502 are coupled through multiplexer mux2 512 to input leads 522 of the CONV_COL 
block 504. The output leads 524 of CONV_COL 504 block are coupled to data leads 526 of memory unit 116 
through multiplexer mux3 so that the data values output from CON V_COL block 504 can be written to the "new" 

35 portion of frame memory unit 116. The writing of the "new" portion of memory unit 11 6 completes the first pass, 
or octave, of the forward wavelet transform. To perform the next pass, or octave, of the forward wavelet trans- 
form, low pass component data values of the octave 0 transformed data values are read from memory unit 
116 and are supplied to input leads 518 of CONV_ROW block 502 via input leads 526, lines 528 and multiplexer 
muxl 510. The flow of data proceeds through row convolver CONV_ROW block 502 and through column con- 

40 volver CONV_COL block 504 with the data output from CONV_COL block 504 again being written into memory 
unit 116 through multiplexer mux3 514 and leads 526. Control block 506 provides control signals to muxl 510, 
mux2 512, mux3 514, CONV_ROW block 502, CONV_COL block 504, DWT address generator block 508, and 
memory unit 116 during this process. This process is repeated for each successive octave of the forward trans- 
form. The data values read from memory unit 116 for the next octave of the transform are the low pass values 

4$ written to the memory unit 116 on the previous octave of the transform. 

The operations performed to carry out the inverse discrete wavelet transform proceed in an order sub- 
stantially opposite the operations performed to carry out the forward discrete wavelet transform. The frame 
of image data begins in the transformed state in memory unit 116. For example, if the highest octave in the 
forward transform (OCT) is octave 1, then transformed data values are read from memory unit 116 and are 

so supplied to the input leads 522 of the CONV_COL block 504 via leads 526, lines 528 and multiplexer mux2 
512. The data values output from CONV_COL block 504 are then supplied to the input leads 518 of 
CONV_ROW block 502 via lines 525 and multiplexer muxl 510. The data values output from CONV_ROW 
block 502 and present on output leads 520 are written into memory unit 116 via lines 532, multiplexer mux3 
514 and leads 526. The next octave, octave 0, of the inverse transform proceeds in similar fashion except that 

55 the data values output by CONV_ROW block 502 are the fully inverse-transformed video data which are sent 
to memory unit 114 via lines 516 rather than to memory unit 116. Control block 506 provides control signals 
to multiplexer muxl 510, multiplexer mux2 512, multiplexer mux3 514, CONV_ROW block 502, CONV_COL 
block 504, DWT address generator block 508, memory unit 116, and memory unit 114 during this process. 
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In both forward wavelet transform and inverse wavelet transform operations, the control block 506 is timed 
by the external video sync signals 201 received from A/D video decoder 110. Control block 506 uses these 
sync signals as well as register input values ximage, yimage, and direction to generate the appropriate control 
signals mentioned above. Control block 506 is coupled to: multiplexer muxl 510 via control leads 550, multi- 
plexer mux2 512 via control leads 552, multiplexer mux3 via control leads 554, CONV_ROW block 502 via 
control leads 546, CONV_COL block 504 via control leads 548, DWT address generator block 508 via control 
leads 534, 544, and 556, memory unit 116 via control leads 2108, and memory unit 114 via control leads 2106. 

As shown in Figure 5, multiplexer muxl 510 couples one of the following three sets of input signals to input 
leads 518 of CONVJROW block 502, depending on the value of control signals on leads 550 supplied from 
CONTROL block 506: digital video input data values received on lines 516 from A/D video decoder 110, data 
values from memory unit 116 or data values from multiplexer mux3 514 received on lines 528, or data values 
from CONV_COL block 504 received on lines 525. Multiplexer mux2 512 couples either the data values being 
output from row convolver CONV_ROW block 502 or the data values being output from multiplexer mux3 514 
received on lines 528 to input leads 522 of CONV.COL block 504, depending on the value of control signals 
on lead 552 generated by CONTROL block 506. Multiplexer mux3 514 passes either the data values being 
output from CONV_ROW 502 received on lines 532 or the data values being output from CONV_COL 504 onto 
lines 523 and leads 526, depending on control signals generated by CONTROL block 506. Blocks CONV_ROW 
502, CONV.COL 504, CONTROL 506, DWT address generator 508, and REGISTERS 536 of Figure 5 are de- 
scribed below in detail in connection with a forward transformation of a matrix of digital image data values. 
Lines 516, 532, 528 and 525 as well as input and output leads 518, 520, 522, 524 and 526 are each sixteen 
bit parallel lines and leads. 

Figure 6 is a block diagram of the row convolver CONV_ROW block 502. Figure 7 is a block diagram of 
the column convolver CONV_COL block 504. Figure 21 is a block diagram of the CONTROL block 506 of Figure 
5. Figure 30 is a block diagram of the DWT address generator block 508 of Figure 5. 

As illustrated in Figure 6, CONV_ROW block 502 comprises a wavelet transform multiplier circuit 602, a 
row wavelet transform circuit 604, a delay element 606, a multiplexer MUX 608, and a variable shift register 
610. To perform a forward discrete wavelet transform, digital video values are supplied one-by-one to the dis- 
crete wavelet transform circuit 122 of the video encoder/decoder chip 112 illustrated in Figure 1. In one em- 
bodiment in accordance with the present invention, the digital video values are in the form of a stream of values 
comprising 8-bit Y (luminance) values, followed by 8-bit U (chrominance) values, followed by 8-bit V (chromi- 
nance) values. The digital video data values are input in "raster scan" form. For clarity and ease of explanation, 
a forward discrete wavelet transform of an eight-by-elght matrix of luminance values Y as described is repre- 
sented by Table 1 . Extending the matrix of Y values to a larger size is straightforward. If the matrix of Y values 
is an eight-by-eight matrix, then the subsequent U and V matrices will each be four-by-four matrices. 

D oo D 0 , D w . . . • Do; 

D I0 D„ D I2 .... D 17 



*>70 &71 D77 

Table 1. 

The order of the Y values supplied to the discrete wavelet transform circuit 122 is Doo, D 0 i, . . . D 07 in the 
first row, then D 10 , D 11t . . . D 17 in the second row, and so forth row by row through the values in Table 1. Mul- 
tiplexer 51 0 in Figure 5 is controlled by control block 506 to couple this stream of data values to the row con- 
volver CONV_ROW block 502. The row convolver CONV_ROW block 502 performs a row convolution of the 
row data values Doo, D 0 i, D 02 , ... D 07 with a high pass four coefficient quasi-Daubechies digital filter G = (d, c, 
-b, a) and a low pass four coefficient quasi-Daubechies digital filter H = (a, b, c, -d) where a = 11/32, b = 19/32, 
c = 5/32, d = 3/32. The coefficients a, b, c, d are related to a four coefficient Daubechies wavelet as described 
in the copending Patent Cooperation Treaty (PCT) application filed March 30, 1994, entitled "Data Compres- 
sion and Decompression". 
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The operation of CONV_ROW block 502 on the data values of Table 1 is explained with reference to Figures 
6, 8, 9, 10 and 11. Figure 8 is a detailed block diagram of the wavelet transform multiplier circuit 602 of the 
CONV_ROW block. Figure 9 is a detailed block diagram of the row wavelet transform circuit 604 of the 
CONV_ROW block. Figure 10 shows a sequence of control signals supplied by the control block 506 of Figure 

5 5 to the row wavelet transform circuit 604 of Figure 9. This sequence of control signals effects a forward one 
dimensional wavelet transform on the rows of the matrix Table 1. The wavelet transform multiplier circuit 602 
of Figure 8 comprises combinatorial logic which multiplies each successive input data value x by various scaled 
combinations of coefficients 32a, 32b, 32c, and 32d. This combinational-logic block comprises shift registers 
802, 804, 806, and 808 which shift the multibit binary input data value x to the left by 1, 2, 3, and 4 bits, re- 

10 spectivety. Various combinations of these shifted values, as well as the input value x itself, are supplied to 
multibit adders 810, 812, 814, 816, and 818. The data outputs 32dx, 32(c-d)x, 32cx, 32ax, 32(a+b)x, 32bx, and 
32(c+d)x are therefore available to the row wavelet transform circuit 604 on separate sets of leads as shown 
in detail in Figures 6 and 9. 

The row wavelet transform circuit 604 of Figure 9 comprises sets of multiplexers, adders, and delay ele- 

15 ments. Multiplexer muxl 902, multiplexer mux2 904, and multiplexer mux3 906 pass selected ones of the data 
outputs of the wavelet transform multiplier circuit 602 of Figure 8 as determined by control signals on leads 
546 from CONTROL block 506 of Figure 5. These control signals on leads 546 are designated muxsel(1), mux- 
sel(2), and muxsel(3) on Figure 9. The remainder of the control signals on leads 546 supplied from CONTROL 
block 506 to the row wavelet transform circuit 604 comprise andsel(1), andsel(2), andsel(3), andsel(4), add- 

20 sel(1), addsei(2), addsel(3), addsef(4), muxandsel(l), muxandsel(2), muxandsel(3), centermuxsei(l) and cen- 
termuxsel(2). 

Figure 10 shows values of the control signals at different times during a row convolution of the forward 
transform. For example, at time t=0, the control input signal to multiplexer mux2 904, muxsel(2), is equal to 2. 
Multiplexer mux2 904 therefore couples its second input leads carrying the value 32(a+b)x to its output leads. 

25 Each of multiplexers 908, 910, 912, and 914 either passes the data value on its input leads, or passes a zero, 
depending on the value of its control signal. Control signals andsel(1) through andsel(4) are supplied to select 
input leads of multiplexers 908, 910, 912, and 914, respectively. Multiplexers 916, 918, and 920 have similar 
functionality. The outputs of multiplexers 916, 918, and 920 depend on the values of control signals muxand- 
sel(1) through muxandsel(3), respectively. Multiplexers 922 and 924 pass either the value on their left" input 

30 leads or the value on their "right" input leads, as determined by control select inputs centermuxsel(l) and cen- 
termuxsel(2), respectively. Adder/subtractors 926, 928, 930, and 932 either pass the sum or the difference of 
the values on their left and right input leads, depending on the values of the control signals addsel(1) through 
addsel(4), respectively. Elements 934, 936, 938, and 940 are one-cycle delay elements which output the data 
values that were at their respective input leads during the previous time period. 

35 Figure 1 1 is a diagram of a data flow through the row convolver CONV_ROW 502 during a forward trans- 

form operation on the data values of Table 1 when the control signals 546 controlling the row convolver 
CONV_ROW 502 are as shown in Figure 1 0. At the left hand edge of the matrix of the data values of Table 1 , 
start forward low pass and start forward high pass filters G s and H s are applied in accordance with equations 
22 and 24 of copending Patent Cooperation Treaty (PCT) application filed March 30, 1994, entitled "Data Com- 

40 pression and Decompression" as follows: 

32Hqo = 32{(a + b)D w + cD 0i - dDoJ 
32Goo = 32{(c + d)Doo - bD 0 i + aDod 
The row wavelet transform circuit of Figure 9 begins applying these start forward low and high pass filters when 
the control signals for this circuit assume the values at time t=0 as illustrated in Figure 10. 

45 At time t=0, muxse!(2) has a value of 2. Multiplexer mux2 904 therefore outputs the value 32(a+b)Doo onto 

its output leads. Muxsel(3) has a value of 3 so multiplexer mux3 906 outputs the value 32(c+d)D<x> into its output 
leads. Because the control signals andsel(2) and andsel(3) cause multiplexers 910 and 912 to output zeros at 
t=0 as shown in Figure 10, the output leads of add er/subt rector blocks 928 and 930 carry the values 32(a+b)Doo 
and 32(c+d)Doo. respectively, as shown in Figure 11. These values are supplied to the input leads of delay ele- 

50 ments 936 and 938. Delay elements 936 and 938 in the case of the row transform are one time unit delay 
elements. The control signals centermuxsel(l) and centermuxsel(2) have no effect at t=0, because control sig- 
nals andsel(2) and andsel(3) cause multipliers 910 and 912 to output zeros. 

At time t=1 , input data value x is the data value D 0i . Control signal muxsel(2) is set to 1 so that multiplexer 
mux2 904 outputs the value 32bD 01 . The select signal centermuxsel(l) for adder/subtractor block 922 is set 

55 to pass the value on its right input leads. The value 32(c+d)Doo, the output of adder/subtractor block 930 at 
t=0. is therefore passed through multiplexer mux4 922 due to the one time unit delay of delay element 938. 
The control signal andsel(2) is set to pass, so the two values supplied to the adder/subtractor block 928 are 
32(c+d)Doo and 32bD 01 . Because the control signal addsel(2) is set to subtract, the value output by adder/sub- 
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tractor block 928 is 32{(c+d)Doo-bD 01 } as shown in Figure 1 1 . Similarly, with the values of control signals cen- 
termuxsel(2), andsel(3), muxsel(3), muxandsel(2)-, and addsel(3) given in Figure 10, the value output by ad- 
der/subtractor block 930 is 32{(a+b)Doo + cD 0 i} as shown in Figure 11. 

At time t=2, input data value x is data value D 02 . The control signals andsel(1), muxsel(1), and muxandsel(1 ) 

5 are set so that the inputs to adder/subtractor block 926 are 32aDo2 and 32{(c+d)D<xrbD 0 i}. The value 
32{(c+d)Doo-bD 0 i) was the previous output from adder/subtractor block 928. Because control signal addsel(1) 
is set to add as shown in Figure 10, the output of block 926 is 32{(c+d)Doo-bDoi+aD 0 2} as shown in Figure 11. 
Similarly, with the value of control signals addsel(4), andsel(4) and muxandsel(3), the value output by ad- 
der/subtractor block 932 is 32{(a+b)D 00 + cD 0i - dDod as 8 hown in Figure 11. 

10 As illustrated in Figure 10, output leads OUT2 (which are the output leads of delay element 940) carry a 
value of 32Hqo at time t=3. The value 32{(a+b)Doo-bD 0 i+aDo2} is equal to 32Hoo because 
32H 0 o=32{(a+b)D 00 +cD 01 -dD 02 ) as set forth above. Similarly, output leads OUT1 (which are the output leads 
of delay element 934) carry a value of 32Gqo at t=3 because output leads of block 926 have a value of 
32{(c+d)Doo-bD 0 i+aDo2) one time period earlier. Because 32H<x> precedes 32Goo in the data stream comprising 

is the high and low pass components in a one-dimensional row convolution, delay element 606 is provided in 
the CONVJROW row convolver of Figure 6 to delay 32Gqo so that 32Ga> follows 32H W on the leads which are 
input to the multiplexer 608. Multiplexer 608 selects between the left and right inputs shown in Figure 6 as 
dictated by the value mux_608, which is provided on one of the control leads 646 from control block 506. The 
signal mux_608 is timed such that the value 32Hoo precedes the value 32Goo on the output leads of multiplexer 

20 608. 

The output leads of multiplexer 608 are coupled to a variable shift register 610 as shown in Figure 6. The 
function of the variable shift register 610 is to normalize the data values output from the CONV.ROW block 
by shifting the value output by multiplexer 608 to the right by m_row bits. In this instance, for example, it is 
desirable to divide the value output of multiplexer 608 by 32 to produce the normalized values Hoo and Goo- To 

25 accomplish this, the value m_row provided by control block 506 via one of the control leads 546 is set to 5. 
The general rule followed by the control block 506 of the discrete wavelet transform circuit is to: (1 ) set m_row 
equal to 5 to divide by 32 during the forward transform, (2) set m_row equal to 4 to divide by 16 during the 
middle of a row during an inverse transform, and (3) set m_row equal to 3 to divide by 8 when generating a 
start or end value of a row during the inverse transform. In the example being described, the start values of a 

30 transformed row during a forward transform are being generated, so m_row is appropriately set equal to 5. 

As illustrated in Figure 10, the centermuxsel(l) and centermuxsel(2) control signals alternate such that 
the values on the right and the left input leads of multiplexers 922 and 924 are passed to their respective output 
leads for each successive data value convolved. This reverses data flow through the adder/subtractor blocks 
928 and 930 in alternating time periods. In time period t=0, for example. Figure 11 indicates that the value 

35 32aD 01 in the column designated "Output of Block 926" in time period t=1 is added to 32bDo2 to form the value 
32{aD 0 i + bDw} in the column designated "Output of Block 928" at time t=2. Then, in time period t=3, the value 
32{dDoi + cDw) In the column designated "Output of Block 930" Is added to 32bD 03 to form the value 32{dD 0 i 
+ cD 02 - bDoa} in the column designated "Output of Block 928". 

Accordingly, in time period t=2, the two values supplied to block 928 are 32bD 02 and the previous output 

40 from block 926, 32bD 01 . Because addsel(2) is set to add as shown in Figure 10, the value output by block 928 
is 32 (a D 0 , + bD^). 

Similarly, the output of block 930 is 32(dDoi + cD 02 ). In this way it can be seen the sequence of control 
signals in Figure 10 causes the circuit of Figure 9 to execute the data flow In Figure 11 to generate, after pas- 
sage through multiplexer mux 608 and shift register 61 0 with m_row set equal to 5, the low and high pass non- 
46 boundary components Hqi, Gol H 02 , and G^. To implement the end forward low and high pass filters beginning 
at t=7 when the last data value of the first row of Table 1, D 0 7, is input to the row convolver, the control signal 
muxsel(2) is set to 3, so that 32(b-a)D 07 is passed to block 928. Control signal muxsel(3) is set to 4, so that 
32(c-d)D 07 is passed to block 930. Control signal addsel(2) is set to subtract and control signal addsel(3) is 
set to add. Accordingly, the output of adder/subtractor 928 Is 32(dDo6+cDoe-(b-a)Do7). Similarly, the output of 
so adder/subtractor 930 is 32(aD M + bDo© + (c-d)D 07 ). 

As shown in Figure 11 , these values are output from blocks 926 and 932 at the next time period when t-8 
by setting muxandsel(l) and muxandsel(3) to be both zero so that adder/subtractor blocks 926 and 932 simply 
pass the values unchanged. Delay elements 934 and 940 cause the values 32Gq 3 and 32Hq3 to be output from 
output leads OUT1 and OUT2 at time t=9. Multiplexer 608, as shown In Figure 6, selects between the output 
55 of delay unit 606 and the OUT2 output as dictated by CONTROL block 506 of Figure 5. Shift register 610 then 
normalizes the output as described previously, with m_row set equal to 5 for the end of the row. The resulting 
values G 03 and H 03 are the values output by the end low pass and end high pass forward transform digital filters 
in accordance with equations 26 and 28 of copending Patent Cooperation Treaty (PCT) application filed March 
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30, 1994, entitled "Data Compression and Decompression". Thus, a three coefficient start forward transform 
low pass filter and a three coefficient start forward transform high pass filter have generated the values Hqo 
and Gqo. A four coefficient quasi-Daubecheis low pass forward transform filter and a four coefficient quasi- 
Daubechets high pass forward transform filter have generated the values H 01 ... G02. A three coefficient end 
5 forward transform low pass filter and a three coefficient end forward transform high pass filter have generated 
the values H03 and G03. 

The same sequence is repeated for each of the rows of the matrix in Table 1. In this way, for each two 
data values input there is one high pass (G) data value generated and there is one low pass (H) data value 
generated. The resulting output data values of COIMV_ROW block 502 are shown in Figure 12. 

10 As illustrated in Figure 5, the values output from row convolver CONV_ROW block 502 are passed to the 

column convolver CONV_COL block 504 in order to perform column convolution using the same filters in ac- 
cordance with the method set forth in copending Patent Cooperation Treaty (PCT) application filed March 30, 
1 994. entitled "Data Compression and Decompression". 

Figure 7 is a block diagram of the column convolver CONV_COL block 504 of Figure 5. The CONV_COL 

15 block 504 comprises a wavelet transform multiplier circuit 702, a column wavelet transform circuit 704, a mul- 
tiplexer 708, and a variable shift register 710. In general, the overall operation of the circuit shown in Figure 
7 is similar to the overall operation of the circuit shown in Figure 6. The wavelet transform multiplier circuit 702 
of the column convolver is identical to the wavelet transform multiplier circuit 602 of Figure 6. The dashed line 
in Figure 8. Therefore, is designated with both reference numerals 602 and 702. 

20 Figure 1 3 is a detailed block diagram of the column wavelet transform circuit 704 of Figure 7 of the column 

convolver. The CONV__COL block 504, as shown in Figure 13, is similar to the CONV_ROW block 502, except 
that the unitary delay elements 934, 936, 938, and 940 of the CONV.ROW block 502 are replaced by line 
delay" blocks 1334, 1336, 1338, and 1340, respectively. The line delay blocks represent a time delay of one 
row which, in the case of the matrix of the presently described example, is eight time units. In some embodi- 

25 ments in accordance with the present invention, the line delays are realized using random access memory 
(RAM). 

To perform a column convolution on the values of the matrix of Figure 12, the first three values Moo, H 10 , 
H20 of the first column are processed to generate, after a bit shift in shift register 710 of Figure 7, low and high 
pass values HHoo and HGoo of Figure 16. The first three values Goo, G 10 , G20 of the second column of the matrix 

jo of Figure 12 are then processed to likewise produce GHoo and GGqo, and so on, to produce the top two rows 
of values of the matrix of Figure 16. Three values in each column are processed because the start low and 
high pass filters are three coefficient filters rather than four coefficient filters. 

Figure 14 is a diagram illustrating control signals which control the column convolver during the forward 
transform of the data values of Figure 12. Figure 15 is a diagram illustrating data flow through the column con- 

35 volver. Corresponding pairs of data values are output from line delays 1334 and 1340 of the column wavelet 
transform circuit 704. For this reason, the low pass filter output values are supplied from the output leads of 
the adder/subtractor block 1332 at the input leads of line delay 1340 rather than from the output leads of the 
line delay 1340 so that a single transformed data value is output from the column wavelet transform circuit in 
each time period. In Figure 14, output data values 32HHoo ... 32GHo3 are output during time periods t=16 to 

40 t=23 whereas output data values 32HGoo ». 32GGo 3 are output during time periods t=24 to t=31 , one line delay 
later After being passed through multiplexer 708 and variable shift register 71 0 of Figure 7, the column con- 
volved data values HHoo •» GH 0 3 and HGqo ... GG03 are written to memory unit 116 under the control of the 
address generator. After all the data values of Figure 1 6 are written to memory unit 1 16, an octave 0 sub-band 
decomposition exists in memory unit 116. 

45 To perform the next octave of decomposition, only the low pass component HH values in memory unit 1 1 6 

are processed. The HH values are read from memory unit 116 and passed through the CONV_ROW block 
502 and CONV_COL block 504 as before, except that the control signals for control block 506 are modified 
to reflect the smaller matrix of data values being processed. The line delay in the CONV_COL block 504 is 
also shortened to four time units because there are now only four low pass component HH values per row. 

50 The control signals to accomplish the octave 1 forward row transform on the data values in Figure 16 are shown 
in Figure 17. The corresponding data flow for the octave 1 forward row transform is shown in Figure 18. Like- 
wise, the control signals to accomplish the octave 1 forward column transform are shown in Figure 19, and 
the corresponding data flow for the octave 1 forward column transform is shown in Figure 20. 

The resulting HHHH, HHHG, HHGH, and HHGG data values output from the column convolver 

55 CONV_COL block 504 are sent to memory unit 116 to overwrite only the locations in memory unit 116 storing 
corresponding HH data values as explained in connection with Figures 17 and 18 of copending Patent Coop- 
eration Treaty (PCT) application filed March 30, 1994, entitled "Data Compression and Decompression". The 
result is an octave 1 sub-band decomposition stored in memory unit 116. This process can be performed on 
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large matrices of data values to generate sub-band decompositions having as many octaves as required. For 
ease of explanation and illustration, control inputs and dataflow diagrams are not shown for the presently de- 
scribed example for octaves higher than octave 1. However, control inputs and dataflows for octaves 2 and 
above can be constructed given the method described in copending Patent Cooperation Treaty (PCT) appli- 

5 cation filed March 30, 1994, entitled "Data Compression and Decompression" along with the octave 0 and oc- 
tave 1 implementation of that method described above. 

Figure 21 illustrates a block diagram of one possible embodiment of control block 506 of Figure 5. Control 
block 506 comprises a counter 2102 and a combinatorial logic block 2104. The control signals for the forward 
and inverse discrete wavelet transform operations, as shown In Figures 10, 14, 17, 19, 22. 24, 26, and 28, are 

10 output onto the output leads of the combinatorial logic block 2104. The input signals to the control block 506 
comprise the sync leads 201 which are coupled to A/D video decoder 110, the direction lead 538 which is cou- 
pled to REGISTERS block 536, and the image size leads 540 and 542 which are also coupled to REGISTERS 
block 536. The values of the signals on the register leads 538, 540, and 542 are downloaded to REGISTERS 
block 536 of the video encoder/decoder chip 112 from data bus 106 via register download bus 128. The output 

is leads of control block 506 comprise CONV_ROW control leads 546, CONV_COL control leads 548, DWT con- 
trol leads 550, 552, and 554, memory control leads 2106 and 2108, DWT address generator mux control lead 
556, DWT address generator read control leads 534, and DWT address generator write control leads 544. 

Counter block 2102 generates the signals row_count, row_carry, col^count, col_carry ( octave, and chan- 
nel, and provides these signals to combinatorial logic block 2104. Among other operations, counter 2102 gen- 

20 erates the signals row_count and row_carry by counting the sequence of data values from 0 up to ximage, 
where ximage represents the horizontal dimension of the image received on leads 540. Similarly, counter 2 102 
generates the signals col_count and col_carry by counting the sequence of data values from 0 up to yimage, 
where yimage represents the vertical dimension of the image received on leads 542. The inputs to combina- 
torial logic block 2104 comprise the outputs of counter block 2102 as well as the inputs direction, ximage, yi- 

25 mage and sync to control block 506. The output control sequences of combinatorial logic block 2104 are conv 
binatorially generated from the signals supplied to logic block 2104. 

After the Y data values of an image have been transformed, the chrominance components U and V of the 
image are transformed. In the presently described specific embodiment of the present Invention, a 4:1 :1 format 
of Y;U:V values is used. Each of the U and V matrices of data values comprises half the number of rows and 

30 columns as does the Y matrix of data values. The wavelet transform of each of these components of chromi- 
nance is simitar to the transformation of the Y data values except the line delays in the CONV_COL are shorter 
to accommodate the shorter row length and the size of the matrices corresponding to the matrix of Table 1 is 
smaller. 

Not only does the discrete wavelet transform circuit of Figure 5 transform image data values into a multi- 
35 octave sub-band decomposition using a forward discrete wavelet transformation, but the discrete wavelet 
transform circuit of Figure 5 can be used to perform a discrete inverse wavelet transform on transformed-image 
data to convert a sub-band decomposition back into the image domain. In one octave of an inverse discrete 
wavelet transform, the inverse column convolver 504 of Figure 5 operates on transformed-image data values 
read from memory unit 116 via leads 526, lines 528 and multiplexer mux2 512 and the inverse row convolver 
40 502 operates on the data values output by the column convolver supplied via leads 524, lines 525 and multi- 
plexer muxl 510. 

Figures 22 and 23 show control signals and data flow for the column convolver 504 of Figure 5 when column 
convolver 504 performs an inverse octave 1 discrete wavelet transform on transformed-image data located 
in memory unit 1 16. As illustrated in Figure 23, the data value output from adder/subtractor block 1 326 of Figure 

45 1 3 at time t=4 is 32{(b-a)HHHHoo + (c-d)HHHGoo). The column convolver therefore processes the first two val- 
ues HHHHoo and HHHGoo in accordance with the two coefficient start reconstruction filter (inverse transform 
filter) set forth in equation 52 of copending Patent Cooperation Treaty (PCT) application filed March 30, 1994, 
entitled "Data Compression and Decompression". Subsequently, blocks 1332 and 1326 output values indicat- 
ing that the column convolver performs the four coefficient odd and even reconstruction filters (interleaved 

so inverse transform filters) of equations 20 and 19 of copending Patent Cooperation Treaty (PCT) application 
filed March 30, 1994, entitled "Data Compression and Decompression". Fig. 23 illustrates that the column con- 
volver performs the two coefficient end reconstruction filter (inverse transform filter) on the last two data val- 
ues HHHH 10 and HHHG 10 (see time t=20) of the first column of transformed data values in accordance with 
equation 59 of copending Patent Cooperation Treaty (PCT) application filed March 30, 1994, entitled "Data 

55 Compression and Decompression". The data values output from the column convolver of Figure 13 are supplied 
to the row convolver 502 of Figure 5 via lines 525 and multiplexer muxl 510. 

Figures 24 and 25 show control signals and data flow for the row convolver 502 of Figure 5 when the row 
convolver performs an inverse octave 1 discrete wavelet transform on the data values output from the column 
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convolver. The column convolver 504 has received transformed values HHHHqo •« HHGH 01 and so forth as 
illustrated in Fig. 23 and generated the values HHHoo ... HHG 0 i and so forth, as illustrated in Fig. 22, onto output 
leads 524. Row convolver 502 receives the values HHHqo ... HHG 01 and so forth as illustrated in Fig. 24 and 
generates the values HH^, HH 0 i, HH^ and so forth as illustrated in Fig. 24 onto output leads 520 of row con- 

5 volver 502. The data flow of Fig. 25 indicates that the row convolver performs the start reconstruction filter 
on the first two data values of a row, performs the odd and even reconstruction filters on subsequent non- 
boundary data values, and performs the end reconstruction filter on the last two data values of a row. The 
HH data values output from row convolver 502 are written to memory unit 116 into the memory locations cor- 
responding with the HH data values shown in Fig. 16. 

10 To inverse transform the octave 0 data values in memory unit 116 into the image domain, the column con- 
volver 504 and the row convolver 502 perform an inverse octave 0 discrete wavelet transform. Figures 26 and 
27 show the control signals and the data flow for the column convolver 504 of Figure 5 when the column con- 
volver performs an inverse octave 0 discrete wavelet transform on transformed image data values in memory 
unit 1 1 6. The data values output from the column convolver are then supplied to the row convolver 502 of Figure 

15 5 via lines 528 and multiplexer muxl 510. 

Figures 28 and 29 show control signals and data flow for the row convolver 502 of Figure 5 when the row 
convolver performs an inverse octave 0 discrete wavelet transform on the data output from the column con- 
volver to inverse transform the transformed-image data back to the image domain. Column convolver 504 re- 
ceives transformed values HHqo ... GH^ and so forth as illustrated in Fig. 27 and generates the values H<x> ■•• 

20 G03 and so forth as illustrated in Fig. 26 onto output leads 524. Row convolver 502 receives the values Hqo ... 
G w and so forth, as illustrated in Fig. 28, and generates the inverse transformed data values Dqo, D 0 i, Dqz ... 
D 07 and so forth, as illustrated in Fig. 28, onto output leads 520 of row convolver 502. The inverse transformed 
data values output from row convolver 502 are written to memory unit 114. 

The control signals and the data flows of Figures 22, 23, 24, 25, 26, 27, 28 and 29 comprise the inverse 

25 transformation from octave 1 to octave 0 and from octave 0 back Into image domain inverse transformed data 
values which are substantially the same as the original data values of the matrix Table 1. The control signals 
which control the row convolver and column convolver to perform the inverse transform are generated by con- 
trol block 506. The addresses and control signals used to read data values from and write data values to mem- 
ory units 1 1 6 and 114 are generated by the DWT address generator block 508 under the control of control block 

30 506. 

After the inverse wavelet transform of the Y matrix of transformed data values is completed, the U and V 
matrices of transformed data values are inverse transformed one after the other in a similar way to the way 
the Y matrix was inverse transformed. 

Figure 30 is a block diagram of the DWT address generator block 508 of Figure 5. The DWT address gen- 

35 orator block 508 supplies read and/or write addresses to the memory units 116 and 114 for each octave of the 
forward and inverse transform. The DWT address generator block 508 comprises a read address generator 
portion and a write address generator portion. The read address generator portion comprises multiplexer 3006, 
adder 301 0, multiplexer 3002, and reset table delay element 301 4. The write address generator portion likewise 
comprises multiplexer 3008, adder 3012, multiplexer 3004, and reset table delay element 3016. The DWT ad- 

40 dress generator is coupled to the control block 506 via control leads 534, 556, and 544, to memory unit 116 
via address leads 3022, and to memory unit 114 via address leads 3020. The input leads of DWT address gen- 
erator 508 comprise the DWT address generator read control leads 534, the DWT address generator write con- 
trol leads 544, and the mux control lead 534. The DWT address generator read control leads 534, in turn, com- 
prise 6 leads which carry the values col_end_R, channel_start_R, reset_R, oct_add_factor_R, incr_R, 

45 base_u_R, and base_v_R. The DWT address generator write control leads 544, in turn, comprise leads which 
carry the values col_end_W, channel_start_W, reset_W, oct_add_factor_W, incr_W, base_u_W, and 
base_v_W. All signals contained on these leads are provided by control block 506. The output leads of DWT 
address generator block 508 comprise address leads 3022 which provide address information to memory unit 
116, and address leads 3020 which provide address information to memory unit 114. The addresses provided 

so on leads 3022 can be either read or write addresses, depending on the cycle of the DWT transform circuit 122 
as dictated by control signal muxcontrol provided by control block 506 on lead 556. The addresses provided 
on leads 3020 are write-only addresses, because memory unit 114 is only written to by the DWT transform 
circuit 122. 

Memory locations of a two-dimensional matrix of data values such as the matrices of Table 1, Figure 12 
55 and Figure 16 may have memory location addresses designated 0, 1 , 2 and so forth, the addresses increasing 
by one left to right across each row and increasing by one to skip from the right most memory location at the 
end of a row to the left most memory location of the next lower row. To address successive data values in a 
matrix of octave 0 data values, the address is incremented by one to read each new data value D from the 
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matrix. 

For octave 1, addresses are incremented by two because the HH values are two columns apart as illu- 
strated in Figure 16. The row number, however, is incremented by two rather than one because the HH values 
are located on every other row. The DWT address generator 508 in octave 1 therefore increments by two until 
5 the end of a row is reached. The DWT address generator then increments once by ximage + 2 as can be seen 
from Figure 16. For example, the last HH value in row 0 of Figure 16 is HH 0 3 at memory address 6 assuming 
HH<x> has an address of 0 and that addresses increment by one from left to right, row by row, through the data 
values of the matrix. The next HH value is in row two, HH 10 , at memory address 16. The increment factor in a 
row is therefore incr = 2 oc4ave . The increment factor at the end of a row is oct_add_fector = (2 oct »» -1) » ximage 
10 + 2 0ctave for octave 2 0, where ximage is the x dimension of the image. 

In some embodiments, the transformed Y data values are stored in memory unit 116 from addresses 0 
through (ximage * yimage - 1 ), where yimage is the y dimension of the matrix of the Y data values. The trans- 
formed U data values are then stored in memory unit 116 from address base_u up to base_v - 1, where: 

base_u = ximage * yimage 

' 5 basev = ximage . yimage ♦ xima 9 e * X ima 9 e 

4 

Similarly, the transformed V data values are stored in memory unit 116 at addresses beginning at address 
base_v. 

The operation of the read address generator portion in Figure 30 is representative of both the read and 
20 write portions. In operation, multiplexer base.mux 3002 of Figure 30 sets the read base addresses to be 0 for 
the Y channel, base_u_R for the U channel, and base__v_R for the V channel. Multiplexer 3002 is controlled 
by the control signals channeLstart_R which signifies when each Y, U t V channel starts. Multiplexer mux 3006 
sets the increment factor to be incr_R t or, at the end of each row, to oct_add_factor_R. The opposite increment 
factor is supplied to adder 3010 which adds the increment factor to the current address present on the output 
25 leads of delay elements 3014 so as to generate the next read address, next_addr_R. The next read address 
next_addr_R is then stored in the delay element 3014. 

In some embodiments in accordance with the present Invention, tables of lpcr_R and oct_add_factor_R 
for each octave are downloaded to REGISTERS block 536 on the video encoder/decoder chip 112 at initiali- 
zation via download registers bus 128. These tables are passed to the control block 506 at initialization. To 
30 clarify the illustration, the leads which connect REGISTERS block 536 to control block 506 are not included 
in Figure 5. In other embodiments, values of incr_R and oct_add_factor_R are precalculated in hardware from 
the value of ximage using a small number of gates located on-chip. Because the U and V matrices have half 
the number of columns as the Y matrix, the U and V jump tables are computed with ximage replaced by 
xw7|ige a Qne bjt sh ^ t Q ecau3e ^ e tree encoder/decoder restricts ximage to be a multiple of 2< 0CT + '> > 2 octev «, 

the addition of 2 0Clav6 in the oct_add_factor is, in fact, concatenation. Accordingly, only the factor (2°« av «-1) * 
ximage must be calculated and downloaded. The jump tables for the U and V addresses can be obtained from 
the Y addresses by shifting this factor one bit to the right and then concatenating with 2 octavB . Accordingly, ap- 
propriate data values of a matrix can be read from a memory storing the matrix and processed data values 

40 can be written back into the matrix in the memory to the appropriate memory locations. 

Figures 31 and 32 are block diagrams of one embodiment of the tree processor/encoder-decoder circuit 124 
of Figure 1 . Figure 31 illustrates the circuit in encoder mode and Figure 32 illustrates the circuit in decoder mode. 
Tree processor/encoder-decoder circuit 124 comprises the following blocks: DECIDE block 3112, TP_ADDR_GEN 
block 3114, quantizer block 3116, MODE_CONTROL block 3118, Huffman encoder-decoder block 3120. buffer 

45 block 3122, CONTROL_COUNTER block 3124, delay element 3126, delay element 3128, and VALUE_REG- 
ISTERS block 3130. 

The tree processor/encoder-decoder circuit 124 is coupled to FIFO buffer 120 via input/output data leads 
1 30. The tree processor/encoder-decoder circuit 124 is coupled to memory unit 116 via an old frame data bus 
3102, a new frame data bus 3104, an address bus 3106, and memory control buses 3108 and 3110. The VAL- 

60 UE_REGISTERS block 3130 of the tree processor/encoder-decoder circuit 1 24 is coupled to data bus 106 via 
a register download bus 128. Figures 31 and 32 illustrate the same physical hardware; the encoder and decoder 
configurations of the hardware are shown separately for clarity. Although two data buses 31 04 and 3102 are 
illustrated separately in Figure 31 to facilitate understanding, the new and old frame data buses may actually 
share the same pins on video encoder/decoder chip 11 2 so that the new and old frame data are time multiplexed 

55 on the same leads 526 of memory unit 116 as illustrated in Figure 5. Control buses 3108 and 3110 of Figure 
31 correspond with the control lines 2108 in Figure 5. The DWT address generator block 508 of the discrete 
wavelet transform circuit 122 and the tree processor address generator block 3114 of the tree processor/en- 
coder-decoder circuit 124 access memory unit 116 therefore may use the same physical address, data and 
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control tines. 

Figure 33 illustrates an embodiment of DECIDE block 3112. Afunction of DECIDE block 3112 is to receive 
a two-by-two block of data values from memory unit 116 for each of the old and new frames and from these 
two-by-two blocks of data values and from the signals on leads 3316, 3318, 3320 and 3322, to generate seven 
flags present on leads 3302, 3304 t 3306, 3308, 3310, 3312 and 3314. The MODE_CONTROL block 3118 uses 
these flags as well as values from VALUE_REGISTERS block 3130 supplied via leads 3316, 3318 and 3320 
to determine the mode in which the new two-by-two block will be encoded. The addresses in memory unit 116 
at which the data values of the new and old two-by-two blocks are located and determined by the address 
generator TP_ADDR_GEN block 3114. 

The input signal on register lead 3316 is the limit value output from VALUE_REGISTERS block 3130. The 
input signal on register leads 3318 is the qstep value output from VALUE_REGISTERS block 3130. The input 
signal on register lead 3320 is the compare value outputfrom VALUE__REGISTERS block 3130. The input sig- 
nal on register lead 3322 is the octave value generated by TP_ADDR_GEN block 3114 as a function of the 
current location in the tree of the sub-band decomposition. As described in copending Patent Cooperation Trea- 
ty (PCT) application filed March 30, 1994, entitled "Data Compression and Decompression" at equations 62- 
71 , the values of the flags new_z, nzjlag, origin, nof lag, no_z, ozjlag, and motion, produced on leads 3302, 
3304, 3306. 3308, 3310, 3312, and 3314, respectively, are determined in accordance with the following equa- 
tions: 



™>- 0s ^ ysl kwW Ey) ~old[x) [y] | (equ. 3} 



nzjlag = nz< limit (equ. 4) 
nof lag = no < compare (equ. 5) 
origin = nz^no (equ. 6) 
motion = ((nz + oz) « octave) ^ no (equ. 7) 
new__z = I new [x] [y] I < qstep, 
for0=£x f y=a1 (equ. 8) 
no_z = I new [x] [yj - old[x] [y] I < qstep, 
forO^x, y^1 (equ. 9) 
ozjlag = ofd[x\ [y] = 0, 
fora//0sx,y, § 1 (equ. 10) 
The DECIDE block 3112 comprises subtracter block 3324, absolute value (ABS) blocks 3326, 3328, and 
3330, summation blocks 3332, 3334, and 3336, comparator blocks 3338, 3340. 3342, 3344, 3346, 3350, and 
3352, adder block 3354, and shift register block 3356. The value output by ABS block 3326 is the absolute 
value of the data value new[x][y] on leads 3104. Similarly, the value output by ABS block 3328 is the absolute 
value of the data value old[x][y] on leads 3102. The value output by ABS block 3330 is the absolute value of 
the difference between the data values new[x][y] and old[x][y]. Comparator 3338, coupled to the output leads 
of ABS block 3326, unasserts new_z flag on lead output 3302 if qstep is less than the value output by block 
3326. Block 3332 sums the last four values output from block 3326 and the value output by block 3332 is sup- 
plied to comparator block 3340. Comparator block 3340 compares this value to the value of limit 3316. The 
flag nzjlag 3304 is asserted on lead 3304 if limit is greater than or equal to the value output by block 3332. 
This value corresponds to nzjlag in equation 4. Summation block 3334 similarly sums the four most recent 
values output by block 3328. The values outputs by blocks 3332 and 3334 are added together by block 3354, 
the values output by block 3354 being supplied to shift register block 3356. The shift register block 3356 shifts 
the value received to the left by octave bits. Summation block 3336 adds the four most recent values output 
by block 3330. Comparator block 3342 compares the value output by block 3332 to the value output by block 
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3336 and asserts the motion flag in accordance with equation 7. The origin flag on output lead 3306 is asserted 
when the value output by block 3332 is less than the value output by 3336. This value corresponds to origin 
in equation 6 above. The value output by block 3336 is compared to the value compare by block 3344 such 
that flag noflag is asserted when compare is greater than the value output from block 3336. Block 3346 com- 

5 pares the value output by block 3330 to the value qstep such that flag no_z is unasserted when qstep is less. 
This corresponds to flag no_z in equation 9. The old input value on leads 3102 is compared to the value 0 by 
block 3350 such that flag ozjlag on lead 3312 is asserted when each of the values of the old block is equal 
to 0. This corresponds to oz_flag in equation 10 above. The seven flags produced by the DECIDE block of 
Figure 33 are passed to the MODE_CONTROL block 3118 to determine the next mode. 

10 The tree processor/encoder-decoder circuit 124 of Figure 31 comprises delay elements 3126 and 3128. 
Delay element 3126 is coupled to the NEW portion of memory unit 11 6 via new frame data bus 31 04 to receive 
the value newfx]Iy). Delay element 3128 Is coupled to the OLD portion of memory unit 116 via old frame data 
bus 3102 to receive the value old[x][y]. These delay elements, which in some embodiments of the invention 
are implemented in static random access memory (SRAM), serve to delay their respective input values read 

15 from memory unit 1 1 6 for four cycles before the values are supplied to quantizer block 31 16. This delay is need- 
ed because the DECIDE block 3112 introduces a four-cycle delay in the dataflow as a result needing to read 
the four most recent data values before the new mode in which those data values will be encoded is determined. 
The delay elements therefore synchronize signals supplied to quantizer block 3116 by the MODE_CONTROL 
block 3118 with the values read from memory unit 116 which are supplied to quantizer block 3116. 

20 The tree processor/encoder-decoder circuit 124 of Figures 31 and 32 comprises a VALUE_REGISTERS 

block 3130. The VALUEJREGISTERS block 3130 serves the function of receiving values from an external 
source and asserting these values onto leads 3316, 3318, 3320, 3132, 3134 and 3136, which are coupled to 
other blocks in the tree processor/encoder-decoder 124. In the presently described embodiment the external 
source is data bus 106 and VALU E_REG ISTE RS block 3130 is coupled to data bus 106 via a download register 

25 bus 128. Register leads 3316 carry a signal corresponding to the value of limit and are coupled to DECIDE 
block 3112 and to MODE.CONTROL block 3118. Register leads 3318 carry signals indicating the value of 
qstep and are coupled to DECIDE block 3112 and to MODE_CONTROL block 3118. Register leads 3320 carry 
signals indicating the value of compare and are coupled to DECIDE block 3112 and to MODE.CONTROL block 
3116. Register leads 3132 carry signals indicating the value of ximage and are coupled to TP_ADDR_GEN 

30 block 3114 and to MODE.CONTROL block 3118. Register leads 3134 carry signals indicating the value of yi- 
mage and are coupled to TP_ADDR_GEN block 3114 and to MODE_CONTROL block 3118. Register lead 3136 
carries a signal corresponding to the value of direction and is coupled to TP_ADDR_GEN block 3114, 
MODE_CONTROL block 3118, buffer block 3122, Huffman encoder-decoder block 3120, and quantizer block 
3116. To clarify the illustration, only selected ones of the connections between the VALUE_REGISTERS block 

35 3130 and other blocks of the tree processor/encoder-decoder circuit 124 are illustrated in Figures 31 and 32. 
VALUE_REGISTERS block 3130 is, in some embodiments, a memory mapped register addressable from bus 
106. 

Figure 34 is a block diagram of an embodiment of address generator TP_ADDR_GEN block 3114 of Figure 
32. The TP_ADDR_GEN block 3114 of Figure 34 generates addresses to access selected two-by-two blocks 

40 of data values in a tree of a sub-band decomposition using a counter circuit (see Figures 27-29 of copending 
Patent Cooperation Treaty (PCT) application filed March 30, 1994, entitled "Data Compression and Decom- 
pression" and the corresponding text). Figure 34 illustrates a three-octave counter circuit. The signals supplied 
to TP_ADDR_GEN block 3114 are provided by MODE_CONTROL block 3118, CONTROL_COUNTER block 
3124, and VALUE_REGISTERS block 3130. MODE_CONTROL block 3118 is coupled to TP_ADDR_GEN 

45 block 3114 by leads 3402 which carry the three bit value new_mode. CONTROL_COUNTER 3124 is coupled 
to TP_ADDR_GEN block 3114 by leads 3404 and 3406 which carry signals read_enable and write_enable, 
respectively. VALUE_REGISTER block 3130 is coupled to TP_ADDR_GEN block 3114 by register leads 3132 
which carry a signal indicating the value of ximage. The output leads of TP_ADDR_GEN block 3114 comprise 
tree processor address bus 3106 and octave leads 3322. The address generator TP_ADDR_GEN block 3114 

so comprises a series of separate counters: counter TreeRoot_x 3410, counter TreeRoot_y 3408, counter C3 
3412, counter C2 3414, counter C1 3416, and counter sub_count 3418. TP_ADDR_GEN block 3114 also com- 
prises CONTROL_ENABLE block 3420, multiplexer 3428, multiplexer 3430, NOR gate 3436, AND gates 3422, 
3424 and 3426, AND gates 3428, 3430 and 3432, multiplier block 3432 and adder block 3434. 

5fi Counter TreeRoot_x 3410 counts from 0 up to ^SSrff - 1 and counter TreeRoot_y 3408 counts from 0 

up to ^ffiff - 1 .where OCT is the maximum number of octaves in the decomposition. Counters C3, C2, C1 , 
and sub_count are each 2-bit counters which count from 0 up to 3, and then return to 0. Each of these counters 



16 



EP 0 622 741 A2 



takes on its next value in response to a respective count enable control signal supplied by CONTROL_ENABLE 
block 3420. Figure 34 shows count enable control signals x_en, y_en, c3_en, c2_en, d_en, and sub_en, being 
supplied to the counters TreeRoot_x, TreeRoot_y, C3, C2. C1 and sub_count, respectively. When one of the 
counters reaches its maximum value, the counter asserts a carry out signal back to the CONTROL_ENABLE 
5 block 3420. These carry out signals are denoted in Figure 34 as x_carry, y_carry, c3_carry, c2_carry, d_car- 
ry f and sub_carry. 

CONTROL_ENABLE block 3420 responds to input signal new_mode on leads 3402 and to the carry out 
signals to generate the counter enable signals. The octave signal output by CONTROL_ENABLE is the value 
of the octave of the transform of the data values currently being addressed. The d_carry, c2_carry, and 

10 c3_carry signals are logically ANDed with the write_enable signal supplied from CONTROL_COUNTER block 
450 before entering the CONTROL_ENABLE block 3420. This AND operation is performed by AND gates 3422, 
3424, and 3426 as shown in Figure 34. The counter enable signals from CONTROL_ENABLE block 3420 are 
logically ANDed with the signal resulting from the logical ORing of read_enable and write_enable by OR gate 
3436. These ANDing operations are performed by AND gates 3428, 3430, and 3432 as shown in Figure 34. 

is AND gates 3422, 3424, 3426, 3428, 3430, and 3432 function to gate the enable and carry signals with the 
read_enaWe and write_enable signals such that the address space is cycled through twice per state, once for 
reading and once for writing. 

The CONTROL_ENABLE block 3420 outputs the enable signals enabling selected counters to increment 
when the count value reaches 3 in the case of the 2-bit counters 3412, 3414, and 3418, or when the count 

20 value reaches *'?if?f - 1 in the case of TreeRoot_x 3410, or when the count value reaches fffff^f - 1 in the 

case of TreeRooLy 3408. The resulting x and y addresses of a two-by-two block of data values of a given 
octave in a matrix of data values are obtained from the signals output by the various counters as follows: 
For octave = 0: 

25 x = TreeRoot_x C3(2) C2(2) C1(2) sub_count(2) (equ. 11) 

y = TreeRooLy C3(1) . C2(1) C1(1) sub_count(1) (equ. 12) 
For octave = 1: . . 

x = TreeRoot_x C3(2) C2(2) sub_count(2) 0 (equ. 13) 
y = TreeRooLy C3(1) C2(1) sub_count<1) 0 (equ. 14) 
30 For octave = 2: 

x = TreeRoot_x C3(2) sub_count(2) 0 0 (equ. 15) 
y = TreeRoot_y C3(1) sub_count(1) 0 0 (equ. 16) 
Figure 34 and equations 11-16 illustrate how the x and y address component values are generated by mul- 
tiplexers 3428 and 3430, respectively, depending on the value of octave. The (2) in equations 11-16 denotes 

35 the least significant bit of a 2-bit counter whereas the (1) denotes the most significant bit of a 2-bit counter. 
TreeRoot_x and TreeRooLy are the multlbit values output by counters 3410 and 3408, respectively. The output 
of multiplexer 3430 is supplied to multiplier 3432 so that the value output by multiplexer 3430 is multiplied by 
the value ximage. The value output by multiplier 3432 is added to the value output by multiplexer 3428 by adder 
block 3434 resulting in the actual address being output onto address bus 3106 and to memory unit 116. 

40 Appendix A discloses one possible embodiment of CONTROL_ENABLE block 3420 of a three octave ad- 

dress generator described in the hardware description language VHDL An overview of the specific implemen- 
tation given in this VHDL code is provided below. The CONTROL.ENABLE block 3420 illustrated in Figure 34 
and disclosed in Appendix A is a state machine which allows trees of a sub-band decomposition to be ascended 
or descended as required by the encoding or decoding method. The CONTROL_ENABLE block 3420 gener- 

45 ates enable signals such that the counters generate four addresses of a two-by-two block of data values at a 
location in a tree designated by MODE_CONTROL block 3118. Instructions from the MODE_CONTROL block 
3118 are read via leads 3402 which carry the value newjmode. Each state is visited forfour consecutive cycles 
so that the four addresses of the block are output by enabling the appropriate counter C3 3412, C2 3414 or 
C1 3416. Once the appropriate counter reaches a count of 3, a carry out signal is sent back to CONTROL_EN- 

50 ABLE block 3420 so that the next state is entered on the next cycle. 

Figure 35 is a state table for the TP_ADDR_GEN block 3114 of Figure 34 when the TP_ADDR_GEN block 
3114 traverses all the blocks of the tree illustrated in Figure 36. Figure 35 has rows, each of which represents 
the generation of four address values of a block of data values. The (0-3) designation in Figure 35 represents 
the four values output by a counter. The names of the states (i.e, upO, up1 , downl) do not indicate movement 

55 up or down the blocks of a tree but rather correspond with state names present in the VHDL code of Appendix 
A. (In Appendix A, the states downl, down2 and down3 are all referred to as downl to optimize the implemen- 
tation.) The state upO in the top row of Figure 35, for example, corresponds to addressing the values of two- 
by-two block located at the root of the tree of Figure 36. In the tree of Figure 36 there are three octaves. After 
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these four addresses of the two- by-two block at the root of the tree are generated, the tree may be ascended 
to octave 1 by entering the state upl. 

Figure 36 i llustrates a complete traversal of all the data values of one tree of a 3-octave sub-band decom- 
position as well as the corresponding states of the CONTROL_ENABLE block of Figure 35. One such tree ex- 

5 ists for each of the "GH", n HG N and N GG" sub-bands of a sub-band decomposition. 

First, before a tree of the sub-band decomposition is traversed, all low pass HHHHHH component values 
of the decomposition are addressed by setting counter sub_countto output 00. Counter C3 341 2 is incremented 
through its four values. Counter TreeRoot_x is then incremented and counter C3 3412 is incremented through 
its four values again. This process is repeated until TreeRoot_x reaches its maximum value. The process is 

10 then repeated with TreeRoot.y being incremented. In this manner, all HHHHHH low pass components are ac- 
cessed. Equations 15 and 16 are used to compute the addresses of the HHHHHH low pass component data 
values. 

Next, the blocks of the "GH" subband of a tree given by TreeRoot_x and TreeRoot_y are addressed. This 
"GH" subband corresponds to the value sub_count = 10 (sub_count (1) = 1 and sub_count (2) = 0). The upO 

15 state shown in Figure 35 is used the generate the four addresses of the root block of the "GH" tree in accordance 
with equation 1 5. The upl state shown In Figure 35 is then used such that addresses corresponding to equations 
1 3 and 14 are computed to access the desired two-by-two block of data values in octave 1. The four two-by- 
two blocks in octave 0 are then accessed in accordance with equations 1 1 and 12. With TreeRoot_x and Tree- 
Root_y and sub_count untouched, the states zzO, zz1, zz2 and zz3 are successively entered, four addresses 

20 being generated in each state. After each one of these four states is exited, the C2 counter 3414 is incremented 
by CONTROL.ENABLE block 3420 via the c2_en signal once in order to move to the next octave 0 block in 
that branch of the tree. After incrementing in state zz3 is completed, the left hand branch of the tree is ex- 
hausted. To move to the next two-by-two block, the C3 counter 3412 is incremented and the C2 counter 3414 
is cycled through its four values to generate the four addresses of the next octave 1 block in state downl in 

25 accordance with equations 13 and 14. In this way, the TP_ADDR_GEN block 3114 generates the appropriate 
addresses to traverse the tree in accordance with instructions received from MODE_CONTROL block 3118. 
When the traversal of the "GH" sub-band tree is completed, the traversal of thasub-band decomposition moves 
to the corresponding tree of the next sub-band without changing the value of TreeRootjc and TreeRoot_y. Ac- 
cordingly, a "GH" "HG" and "GG" family of trees are traversed. After all the blocks of the three sub-band trees 

30 have been traversed, the TreeRoot_x and TreeRoot_y values are changed to move to another family of sub- 
band trees. 

To move to the next family of sub-band trees, the counter TreeRoot_x 3410 is incremented, and the C3 
3412, C2 3414, C1 3416 counters are returned to 0. The process of traversing the new "GH" tree under the 
control of the MODE_CONTROL block 3118 proceeds as before. Similarly, the corresponding "HG" and "GG* 

35 trees are traversed. After TreeRoot_x 3410 reaches its final value, a whole row of tree families has been 
searched. The counter TreeRoot_y 3408 is therefore incremented to move to the next row of tree families. This 
process may be continued until all of the trees in the decomposition have been processed. 

The low pass component HHHHHH (when sub_count = 00) does not have a tree decomposition. In accor- 
dance with the present embodiment of the present invention, all of the low pass component data values are 

40 read first as described above and are encoded before the tree encoder reads and encodes the three subbands. 
The address of the data values in the HHHHHH subband are obtained from the octave 3 x and y addresses 
with sub_count = 00. Counters C3 3412, TreeRoot_x 3410, and TreeRoot_y 3408 run through their respective 
values. After the low pass component data values and all of the trees of all the sub-bands for the Y data values 
have been encoded, the tree traversal method repeats on the U and V data values. 

45 Although all the blocks of the tree of Figure 36 are traversed in the above example of a tree traversal, the 
MODE_CONTROL block 3118 may under certain conditions decide to cease processing data values of a par- 
ticular branch and to move to the next branch of the tree as set forth in copending Patent Cooperation Treaty 
(PCT) application filed March 30, 1994, entitled "Data Compression and Decompression". This occurs, for ex- 
ample, when the value new_mode output by the MODE_CONTROL block 3118 indicates the mode STOP. In 

so this case, the state machine of CONTROL_ENABLE block 3420 will move to, depending on the current location 
in the tree, either the next branch, or, if the branch just completed is the last branch of the last tree, the next 
tree. 

Figure 34 illustrates control signal inputs read_enable and write_enable being supplied to TP_ADDR_GEN 
block 3114. These enable signals are provided because the reading of the new/old blocks and the writing of 
55 the updated values to the old frame memory occur at different times. To avoid needing two address generators, 
the enable signals of the counters C3 341 2, C2 3414, and C1 341 6 are logically ANDed with the logical OR of 
the read_enabie and write_enable signals. Similarly, the ca rr you t signals of these counters are logically ANDed 
with the write_enable signal. During time periods when the new/old blocks are read from memory, the 
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read_enaWe signal is set high and the write_enable signal is set low. This has the effect of generating the ad- 
dresses of a two-by-two block, but disabling the change of state at the end of the block count. The counters 
therefore return to their original values they had the start of the block count so that the same sequence of four 
address values will be generated when the write_enable signal is set high. This time, however, the carry out 
5 is enabled into the CONTROL.ENABLE block 3420. The next state is therefore entered at the conclusion of 
the block count. In this manner, the address space is cycled through twice per state, once for reading and once 
for writing. 

Figure 37 is a block diagram of one embodiment of quantizer block 3116 of Figure 31 . As shown in Figure 
31, quantizer block 31 16 is coupled to MODE.CONTROL block 3118, a Huffman encoder-decoder block 3120, 

w delay block 3126, delay block 3128, and VALUE_REGISTERS block 3130. Input lead 3702 carries the signal 
difference from MODE_CONTROL block 3118 which determines whether a difference between the new frame 
and old frame is to be quantized or whether the new frame alone Is to be quantized. Values new[x][y] and 
old[x][y] are supplied on lines 3704 and 3706, respectively, and represent values from memory unit 116 delayed 
by four clock cycles. Input leads 3708 and 3710 carry the values signjnv and qindexjnv from the Huffman 

15 encoder-decoder block 3120, respectively. Register leads 3318 and 3136 carry signals corresponding to the 
values c<6tep and direction from VALUE_REG!STERS block 3130, respectively. 

During encoding, quantizer block 3116 performs quantization on the values new[x][y], as dictated by the 
signal difference and using the values old[x][y], and generates the output values qindexonto output leads 3712, 
sign onto output lead 3714, and a quantized and then inverse quantized value old[x][y] onto data bus 3102. 

20 The quantized and inverse quantized value old[x][y] is written back into memory unit 116. 

During decoding, quantizer block 3116 performs inverse quantization on the values old[x][y], as dictated 
by the signals difference, signjnv, and qindexjnv, and generates an inverse quantized value, old[x][y], which 
is supplied to the old portion of memory unit 116 via bus 3102. Lead 3136 carries the value direction supplied 
by the VALUE_REGISTERS 3130. 

25 The value direction controls whether the quantizer operates In the encoder mode or the decoder mode. 

Figure 37 illustrates that multiplexers 3716 and 3718 use the direction signal to pass signals corresponding 
to the appropriate mode (sign and qindex for encoder mode; signjnv and qindexjnv for decoder mode). Mul- 
tiplexer 3720 passes either the difference of the new and old data values or passes the new value depending 
on the value of the difference signal. Absolute value block ABS 3722 converts the value output by multiplexer 

30 3720 to absolute value form and supplies the absolute value form value to block 3724. The output leads of 
multiplexer 3720 are also coupled to sign block 3726. Sign block 3726 generates a sign signal onto lead 3714 
and to multiplexer 3716. 

Block 3724 of the quantizer block 3116 is an human visual system (HVS) weighted quantizer having a 
threshold of qstep. The value on input leads 3728 denoted mag in Figure 37 is quantized via a modulo-qstep 

35 division (see Figures 30 and 31 of copending Patent Cooperation Treaty (PCT) application filed March 30, 
1994, entitled "Data Compression and Decompression" and the corresponding text). The resulting quantized 
index value qindex is output onto leads 3712 to the Huffman encoder block 3120. Multiplexer 3716 receives 
the sign signal on leads 3714 from block 3726 and also the signjnv signal on lead 3708. Multiplexer 3716 
passes the sign value in the encoder mode and passes the signjnv value in the decoder mode. Likewise, mul- 

40 tiplexer 3718 has as two inputs, the qindex signal on leads 3712 and the qindexjnv signal on leads 3710. 
Multiplexer 3718 passes the qindex value in the encoder mode and the qindexjnv value in the decoder mode. 
Inverse quantizer block 3730 inverse quantizes the value output by multiplexer 3718 by the value qstep to gen- 
erate the value qvalue. Block NEG 3732 reverses the sign of the value on the output lead of block 3730, denoted 
qvalue in Figure 37. Multiplexer 3734 chooses between the positive and negative versions of qvalue as deter- 

45 mined by the signal output from multiplexer 3716. 

In the encoder mode, if the difference signal is asserted, then output leads 3712 qindex carry the quantized 
magnitude of the difference between the new and old data values and the output leads 3736 of multiplexer 
3734 carry the inverse quantization of this quantized magnitude of the difference between the new and old 
values. In the encoder mode, if the difference input is deasserted, then the output leads 3712 qindex carry 

so only the quantized magnitude of the new data value and the value on leads 3736 is the inverse quantization 
of the quantized magnitude of the new data value. 

Adder block 3738 adds the inverse quantized value on leads 3736 to the old[x][y] data value and supplies 
the result to multiplexer 3740. Accordingly, when the difference signal is asserted, the difference between the 
old inverse quantized value on leads 3706 and the inverse quantized value produced by inverse quantizer 3730 

55 is determined by adding in block 3738 the opposite of the inverse quantized output of block 3730 to the old 
inverse quantized value. Multiplexer 3740 passes the output of adder block 3738 back into the OLD portion of 
memory unit 116 via bus 3102. If, on the other hand, the difference signal is not asserted, then multiplexer 
3740 passes the value on leads 3736 to the OLD portion of memory unit 11 6 via bus 31 02. Accordingly, a frame 
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of inverse quantized values of the most recently encoded frame is maintained in the old portion of memory 
unit 116 during encoding. 

In accordance with one embodiment of the present invention, the value of qstep is chosen so that qstep 
- 2 n t where 0 ^ n ^ 7, so that quantizer block 3724 and inverse quantizer 3730 perform only shifts by n bits. 
5 Block 3724 then becomes in VHDL, where » denotes a shift to the left and where mag denotes the value 
output by block 3722: 
CASE n is 

WHEN 0 => qindex: = mag; 
WHEN 1 => qindex: = mag » 1; 



WHEN 7 => qindex: = mag » 7; 
END CASE; 

Similarly, block 3730 is described in VHDL as follows: 
13 CASE n is 

WHEN 0 => qvaiue : = qindex; 

WHEN 1 => qvaiue : = (qindex « 1) & "0"; 

WHEN 2 => qvaiue : = (qindex « 2) & "01"; 

20 

WHEN 7 => qvaiue : = (qindex « 7) & "0111111"; 
where « denotes a shift to the right and where & denotes concatenation. The factor concatenated after the 
shift is 2"" i-1. 

The tree processor/encoder-decoder circuit 124 of Figure 31 also includes a MODE_CONTROL block 
25 3118. In the encoder mode, MODE_CONTROL block 3118 determines mode changes as set forth in copending 
Patent Cooperation Treaty (PCT) application filed March 30, 1994, entitled "Data Compression and Decom- 
pression" when trees of data values are traversed to compress the data values into a compressed data stream. 
In the decoder mode, MODE_CONTROL block 3118 determines mode changes as set forth in copending Pa- 
tent Cooperation Treaty (PCT) application filed March 30, 1994, entitled "Data Compression and Decompres- 
30 sion" when trees of data values are recreated from an incoming compressed data stream of tokens and data 
values. 

MODE_CONTROL block 3118 receives signals from DECIDE block 3112, CONTROL.COUNTER block 
3124, TP_ADDR_GEN block 3114, and VALUE_REGISTERS block 3130. MODE_CONTROL block 3118 re- 
ceives the seven flag values from DECIDE block 311 2. The input from CONTROL_COUNTER block 3124 is 

35 a four-bit state vector 31 38 indicating the state of the CONTROL_COUNTER block 3 1 24. Four bits are needed 
because the CONTROL_COUNTER block 3124 can be in one of nine states. The input from TP_ADDR_GEN 
block 311 4 is the octave signal carried by leads 3322. The VALUE_REGISTERS block 31 30 supplies the values 
on leads 3316, 331 8, 3320, 31 32, 31 34, and 3136 to MODE_CONTROL block 3118. Additionally, in the decoder 
mode, buffer 3122 supplies token values which are not Huffman decoded onto leads 3202 and to the 

40 MODE_CONTROL block 311 8 as shown in Figure 32. 

MODE_CONTROL block 3118 outputs a value newjnode which is supplied to TP_ADDR_GEN block 3114 
via leads 3402 as well as a token length value T_L which is supplied to buffer block 3122 via leads 3140. In 
the encoder mode, MODE_CONTROL block 3118 also generates and supplies tokens to buffer block 3122 via 
leads 3202. Leads 3202 are therefore bidirectional to carry token values from MODE_CONTROL block 3118 

45 to buffer block 3122 in the forward mode, and to carry token values from buffer 3122 to MODE_CONTROL 
block 3118 in the decoder mode. The token length value T_L, on the other hand, Is supplied by 
MODE_CONTROL block 31 1 8 to buffer block 31 22 in both the encoder and decoder modes. MODE_CONTROL 
block 3118 also generates the difference signal and supplies the difference signal to quantizer block 3116 via 
lead 3142. MODE_CONTROL block 3118 asserts the difference signal when differences between new and 

so old values are to be quantized and deasserts the difference signal when only new values are to be quantized. 
Appendix B is a VHDL description of an embodiment of the MODE_CONTROL block 3118 in the VHDL lan- 
guage. 

In the encoding process, the MODE_CONTROL block 3118 initially assumes a mode, called pro_mode, 
from the block immediately below the block presently being encoded in the present tree. For example, the 
55 blocks in Figure 36 corresponding to states zzO, .., zz3 in the left-most branch inherit their respective pro_mo- 
des from the left-most octave 1 block. Similarly, the left-most octave 1 block in Figure 36 inherits its pro_mode 
from the root of the tree in octave 2. After the data values of the new and old blocks are read and after the 
DECIDE block 3112 has generated the flags for the new block as described above, the state machine of 
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M0DE_CONTROL block 3118 determines the new_mode for the new block based on the new data values, 
the flags, and the pro-mode. The value of new_mode, once determined, is then stored as the current mode of 
the present block in a mode latch. There is one mode latch for each octave of a tree and one for the low pass 
data values. The mode latches form a stack pointed to by octave so that the mode latches contain the mode 

5 in which each of the blocks of the tree was encoded. 

The tree processor circuit of Figures 31 and 32 also comprises a Huffman encoder-decoder block 3120. 
In the encoder mode, inputs to the Huffman encoder-decoder block 3120 are supplied by quantizer block 3116. 
These inputs comprise the qindex value and the sign signal and are carried by leads 3712 and 3714, respec- 
tively. The outputs of Huffman encoder-decoder 3120 comprise the Huffman encoded value on leads 3142 

10 and the Huffman length H_L on leads 3144, both of which are supplied to buffer block 3122. 

In the decoder mode, the input to the Huffman encoder-decoder block 3120 is the Huffman encoded value 
carried by leads 3204 from buffer block 3122. The outputs of the Huffman encoder-decoder 3120 comprise 
the Huffman length H_L on leads 3144 and the signjnv and qindexjnv values supplied to quantizer block 
3116 via leads 3708 and 3710, respectively. 

is The Huffman encoder-decoder block 3120 implements the Huffman table shown in Table 2 using combi- 

natorial logic. 



20 
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Table 2 



In the encoder mode, qindex values are converted into corresponding Huffman codes for incorporation 
into the compressed data stream. Tokens generated by the MODE_C0NTROL block 3118, on the other hand, 
55 are not encoded but rather are written directly into the compressed data stream. 

Figure 38 illustrates one possible embodiment of buffer block 3122 of Figures 31 and 32. The function of 
buffer block 3122 in the encoder mode is to assemble encoded data values and tokens into a single serial com- 
pressed data stream. In the decoder mode, the function of buffer block 3122 is to deassemble a compressed 
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serial data stream into encoded data values and tokens. Complexity is introduced into buffer block 3122 due 
to the different lengths of different Huffman encoded data values. As illustrated in Figure 31, buffer 3122 is 
coupled to FIFO buffer 120 via input-output leads 130, to MODE_CONTROL block 3118 via token value leads 
3202 and token length leads 3140, to Huffman encoder-decoder 3120 via leads 3144 and Huffman length leads 
5 3144, to CONTROL_COUNTER 3124 via cycle select leads 3802, and to VALUE.REGISTERS 3130 via leads 
3136. 

The direction signal carried on leads 3136 from VALUE_REGISTERS block 3130 determines whether the 
buffer block 31 22 operates in the encoder mode or in the decoder mode. In encoder mode, multiplexers 3804, 
3806, 3808 and 3814 select the values corresponding to their "E" inputs in Figure 38. In the encoder mode, 

10 the buffer block 3122 processes the Huffman encoded value signal present on leads 3142, the token value 
signal present on leads 3202, the cycle select signal on leads 3802, the Huffman length signal H_L on leads 
3144, and the token length signal T_L on leads 3140. The cycle select signal, supplied by CONTROL.COUN- 
TER block 3124 via leads 3802, is supplied to multiplexers 3810 and 3812 to control whether a Huffman en- 
coded value (received from Huffman encoder-decoder block 31 20) or whether a non-encoded token value (re- 

15 ceived from MODE_CONTROL block 3118) Is the value presently being assembled into the output data stream. 
Figure 39 illustrates a simplified diagram of the buffer block 31 22 of Figure 38 when configured in encoder 
mode. The value Sr is a running modulo sixteen sum of the input token length values and Huffman value length 
values. The circuit which determines % comprises adder block 3902, modulo sixteen divider block 3904, and 
delay block 3906. When the incoming length value added to the prior value Sf produces a length result of sixteen 

20 or greater, block 3904 subtracts sixteen from this length result to determine the new value of Sf. Comparator 
block 3908 also sends a signal highjow to input lead 3916 of multiplexer 3901 indicating that Sf has exceeded 
sixteen. Figure 39 shows a barrel shifter 3912 receiving data input values from the output data leads of mul- 
tiplexer 3901 and from the output data leads of multiplexer 3810. Barrel shifter 3912 sends a 32-bit output sig- 
nal to a 32-bit buffer 3914. The lower 16-bit output of 32-bit buffer 3914 constitutes the encoded bit stream 

25 output of the video encoder/decoder chip which is output onto input/output leads 1 30. 

When the prior value of Sf plus the incoming value length is sixteen or greater, then the tower sixteen bits 
of buffer 3914 are sent out to FIFO buffer 120 and multiplexer 3901 is set to pass the upper sixteen bits of 
buffer 3914 back Into the lower sixteen bit positions in barrel shifter 3912. The value Sf Is then decremented 
by sixteen. These passed back bits will next become some of the bits in the lower sixteen bits of buffer 3914, 

30 on which a subsequent incoming encoded value or token received from multiplier 3810 will be stacked by the 
barrel shifter starting at location Sf to make sixteen or more packed bits. 

Alternatively, if the value of Sf plus the length of the new incoming value is less than sixteen, then multi- 
plexer 3901 is controlled to pass the lower sixteen bits of buffer 3914 back to barrel shifter 3912 and no bits 
are applied to FIFO buffer 120. The bits of a subsequent incoming encoded value or taken from multiplexer 

35 3810 will be stacked on top of the bits of prior encoded data values or tokens in barrel shifter 3912. Because 
the value Sf did not exceed sixteen, Sf Is not decremented by sixteen. 

Figure 40 illustrates a typical output of the barrel shifter 3912 of the buffer 3122 in encoder mode. The 
maximum length of a Huffman encoded word is sixteen bits. All tokens are two bits in length, where length is 
the number of bits in the new encoded value or token. The value s in Figure 40 indicates the bit position in the 

40 barrel shifter 3912 immediately following the last encoded data value or token present in the barrel shifter. 
Accordingly, a new encoded value or token is written into barrel shifter 3912 at positions s... s + length. The 
resulting 32-bit output of the barrel shifter is rewritten to the 32-bit buffer 391 4. The comparator block compares 
the new value of s + length to sixteen. If this value s + length is sixteen or greater as illustrated in Figure 40, 
then the control signal highjow on multiplexer input lead 3916 is asserted. The lower sixteen bits of the buffer 

45 are therefore already completely packed with either bits of data values and/or with bits of tokens. These lower 
sixteen bits are therefore output to comprise part of the output data stream. The upper sixteen bits, which are 
incompletely packed with data values and/or tokens, are sent back to the lower sixteen bit positions in the barrel 
shifter so that the remaining unpacked bits in the lower sixteen bits can be packed with new data bits or new 
token bits. 

50 If, on the other hand, this value s + length is fifteen or less, then there remain unpacked bits in the lower 

sixteen bit positions in barrel shifter 391 2. These lower bits in barrel shifter 391 2 can therefore not yet be output 
via buffer 3914 onto lines 130. Only when s + length is sixteen or greater will the contents of barrel shifter 
3912 be written to buffer 3914 so that the lower sixteen bits will be output via leads 130. 

In the decoder mode, buffer 3122 receives an encoded data stream on leads 130, the token length signal 

55 T_L on leads 3140 from MODE_CONTROL block 311 8, the Huffman encoded length signal H_Lon leads 3144, 
and the control signal cycle select on lead 3802. Multiplexers 3804, 3806, and 3808 are controlled to select 
values on their respective "D" inputs. Cycle select signal 3802 selects between the Huffman encoded length 
H_L and the token length T_L depending on whether a data value or a token is being extracted from the in- 
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coming data stream. 

Figure 41 illustrates a simplified diagram of the buffer block 3122 configured in the decoder mode. The 
value S| is a running modulo thirty-two sum of the input token length values and Huffman value length values. 
The circuit which determines the value of s t comprises adder block 4002, modulo thirty-two divider block 4004, 

5 and delay block 4006. When the incoming length value added to s, results in a value greater than thirty-two, 
modulo thirty-two divider block 4004 subtracts thirty-two from this value. A comparator block 4008 sends a 
signal to buffer 3914 indicating when s, has reached a value greater than or equal to thirty-two. Additionally, 
comparator block 4008 sends a signal to both buffer 3914 and to multiplexers 3901 and 4010 indicating when 
Si has reached a value greater than or equal to sixteen. 

w Buffer 31 22 in the decoder mode also comprises buffer 3914, multiplexers 3901 and 4010, and barrel shift- 

er 40 12. In the case of a Huffman encoded data value being the next value in the incoming data stream, sixteen 
bits of the encoded data stream that are present in barrel shifter 4012 are passed via output leads 3204 to 
the Huffman decoder block 3120. The number of bits in the sixteen bits that represent an encoded data value 
depends on the data value itself in accordance with the Huffman code used. In the case of a token being the 

is next value in the incoming data stream, only the two most significant bits from barrel shifter 4012 are used 
as the token value which is output onto leads 3202 to MODE_CONTROL block 3118. The remaining fourteen 
bits are not output during this cycle. After a number of bits of either an encoded data value or a two-bit token 
is output, the value of s, is updated to point directly to the first bit of the bits in barrel shifter 401 2 which follows 
the bit last output. The circuit comprising adder block 4002, module block 4004, and delay element 4006 adds 

20 the length of the previously output value or token to S| modulo thirty-two to determine the starting location of 
the next value or token in barrel shifter 401 2. Comparator block 4008 evaluates the value of 3, plus the incoming 
length value, and transmits an active value on lead 4014 when this value is greater than or equal to sixteen 
and also transmits an active value on lead 4016 if this value is greater than or equal to thirty-two. When s, is 
greater or equal to sixteen, the buffer 3914 will read in a new sixteen bits of encoded bit stream bits into its 

25 lower half. When s, £ 32, the buffer 3914 will read a new sixteen bits into its upper half. The two multiplexers 
4010 and 3910 following the buffer 3914 rearrange the order of the low and high halves of the buffer 3914 to 
maintain at the input leads of barrel shifter 4012 the original order of the encoded data stream. 

The tree processor/encoder-decoder circuit 1 24 of Figures 31 and 32 comprises a CONTROL_COUNTER 
block 3124. CONTROL_COUNTER block 3214 controls overall timing and sequencing of the other blocks of 

30 the tree processor/encoder/decoder circuit 124 by outputting the control signals that determine the timing of 
the operations that these blocks perform. In accordance with one embodiment of the present invention, the 
tree processor/encoder/decoder 112 is fully pipelined in a nine stage pipeline sequence, each stage occupying 
one clock cycle. Appendix C illustrates an embodiment of CONTROL_COUNTER block 3124 described in 
VHDL code. 

35 The signals output by CONTROL.COUNTER block 3124 comprise a read_enable signal on lead 3404, 

which is active during read cycles, and a write_enable signal on lead 3406, which is active during write cycles. 
The signals output also comprise memory control signals on leads 3108 and 3110, which control the old and 
new portions of memory unit 116, respectively, for reading from memory or writing to memory. The signals 
output also comprise a 4-bit state vector on lead 31 38, which supplies MODE_COIMTROL block 3118 with the 

40 current cycle. The four-bit state vector counts through values 1 through 4 during the "skip" cycle, the value 5 
during the "token" cycle, and the values 6-9 during the "data" cycle. The signals output by CONTROL_COUN- 
TER block 3124 also comprise a cycle state value on leads 3802, which signals buffer 3122 when a token cycle 
or data cycle is taking place. 

Figure 42 illustrates a pipelined encoding/decoding process controlled by CONTROL_COUNTER block 

45 3124. Cycles are divided into three types: data cycles - when Huffman encoded/decoded data is being out- 
put/input into the encoded bit stream and when old frame values are being written back to memory; token cy- 
cles - when a token is being output/input; and skip cycles - the remaining case when no encoded/decoded data 
is output to or received from the encoded bit stream. A counter in CONTROL_COUNTER block 3124 counts 
up to 8 then resets to 0. At each sequence of the count, this counter decodes various control signals depending 

so on the current MODE. The pipeline cycles are: 

0) read old[0][0] and in encode new[0][0]; skip cycle. 

1) read old[1][0] and in encode new[1][0]; skip cycle. 

2) read old[0][1] and in encode new[0][1]; skip cycle. 

3) read old[1][1] and in encode new{1][1]; skip cycle. 

55 4) DECIDE blocks outputs flags MODE_CONTROL write/read token into/from coded data stream: 

generates newjmode, outputs tokens in encode; 
generates newjmode, inputs tokens in decode; 
token cycle. 
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5) Huffman encode/decode qindex[0][0], and write old[0][0]; data cycle. 

6) Huffman encode/decode qindex[1][0], and write old[1][0]; data cycle. 

7) Huffman encode/decode qindex[0][1], and write old(0][1]; data cyde. 

8) Huffman encode/decode qindex[1][1], and write old[1][1]; data cyde. 

5 Figure 42 illustrates that once the newjmode is calculated, another block of data values in the tree can 

be processed. The tree processor/encoder/decoder is thus fully pipelined, and can process four new trans- 
formed data values every five dock cydes. To change the pipeline sequence, it is only required that the control 
signals in the block CONTROL_COUNTER block 3124 be reprogrammed. 

io ADDITIONAL EMBODIMENTS 

In accordance with the above-described embodiments, digital video in 4:1:1 format is output from A/D video 
decoder 110 on lines 202 to the discrete wavelet transform circuit 122 of video encoder/decoder circuit 112 
row by row in raster-scan form. Figure 43 illustrates another embodiment in accordance with the present in- 

15 vention. Analog video is supplied from video source 1 04 to an A/D video decoder circuit 4300. The A/D video 
decoder circuit 4300, which may, for example, be manufactured by Philips, outputs digital video in 4:2:2 format 
on lines 4301 to a horizontal ded meter circuit 4302. For each two data values input to the horizontal decimeter 
circuit 4302, the horizontal decimeter circuit 4302 performs low pass filtering and outputs one data value. The 
decimated and tow pass filtered output of horizontal ded meter circuit 4302 is supplied to a memory unit 114 

20 such that data values are written into and stored in memory unit 114 as illustrated in Figure 43. The digital 
video in 4:2:2 format on lines 4301 occurs at a frame rate of 30 frames per second, each frame consisting of 
two fields. By discarding the odd field, the full 33*3 ma frame period is available for transforming and com- 
pressing/decompressing the remaining even field. The even fields are low-pass filtered by the horizontal deci- 
meter circuit 4302 such that the output of horizontal decimeter circuit 4302 occurs at a rate of 30 frames per 

25 second, each frame consisting of only one field. Memory unit 114 contains 640 x 240 total image data values. 
There are 320 x 240 Y data values, as well as 160 x 240 U data values, as well as 160 x 240 V data values. 

In order to perform a forward transform, the Y values from memory unit 1 14 are read by video encoder/de- 
coder chip 112 as described above and are processed by the row convolver and column convolver of the dis- 
crete wavelet transform circuit 1 22 such that a three octave sub-band decomposition of Y values is written into 

30 memory unit 116. The three octave sub-band decomposition for the Y values is illustrated in Figure 43 as being 
written into a Y portion 4303 of the new portion of memory unit 116. 

After the three octave sub-band decomposition for the Y values has been written into memory unit 116, 
the video encoder/decoder chip 112 reads the U image data values from memory unit 114 but bypasses the 
row convolver. Accordingly, individual columns of U values in memory unit 1 14 are digitally filtered Into low and 

35 high pass components by the column convolver. The high pass component G is discarded and the low pass 
component H is written into U portion 4304 of the new portion of memory unit 116 illustrated in Figure 43. After 
the U portion 4304 of memory unit 116 has been written with the low pass H component of the U values, video 
encoder/decoder chip 112 reads these U values from U portion 4304 and processes these U data values using 
both the row convolver and column convolver of the discrete wavelet transform circuit 122 to perform an ad- 

40 ditional two octaves of transform to generate a U value sub-band decomposition. The U value sub-band de- 
composition is stored in U portion 4304 of memory unit 116. Similarly, the V image data values in memory 
unit 1 14 are read by video encoder/decoder chip 112 into the column convolver of the discrete wavelet transform 
circuit 122, the high pass component G being discarded and the low pass component H being written into V 
portion 4305 of the new portion of memory unit 116. The V data values of V portion 4305 are then read by the 

45 video encoder/decoder chip 1 1 2 and processed by both the row convolver and the column convolver of discrete 
wavelet transform circuit 122 to generate a V sub-band decomposition corresponding to the U sub-band de- 
composition stored in U portion 4304. This process completes a forward three octave discrete wavelet trans- 
form comparable to the 4:1:1 three octave discrete wavelet transform described above in connection with Fig- 
ures 3A-3C. Y portion 4303 of memory unit 116 comprises 320 x 240 data value memory locations; U portion 

so 4304 comprises 1 60 x 120 data value memory locations; and V portion 4305 comprises 160 x 120 data value 
memory locations. 

The DWT address generator 508 i llust rated in Figure 5 generates a sequence of 1 9-bit addresses on output 
lines OUT2. In accordance with the presently described embodiment, however, memory unit 114 is a dynamic 
random access memory (DRAM). This memory unit 114 is loaded from horizontal decimeter circuit 4302 and 
55 is either read from and written to by the video encoder/decoder chip 112. For example, in order for the video 
encoder/decoder chip 112 to access the Y data values in memory unit 114 the lnc_R value supplied to DWT 
address generator 508 by control block 506 is set to 2. This causes the DWT address generator 508 of the 
video encoder/decoder chip 112 to increment through even addresses as illustrated in Figure 43 such that only 
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the Y values in memory unit 114 are read. After all the Y values are read from memory unit 114 and are trans- 
formed into a Y sub-band decomposition, then base_u_R is changed to 1 and the ChanneLstart_r is set so 
that BASEJvlUX 3002 of Figure 30 selects the base_u_R to address the first U data value In memory unit 114. 
Subsequent U data values are accessed because the incJR value is set to 4 such that only U data values in 

5 memory unit 114 are accessed. Similarly, the V data values are accessed by setting the base_v_R value to 
3 and setting the ChanneLstart_r value such that BASE_MUX 3002 selects the base_v_R Input leads. Suc- 
cessive V data values are read from memory unit 114 because the inc_R remains at 3. 

Because in accordance with this embodiment the video encoder/decoder chip 112 reads memory unit 114, 
the DWT address generator 508 supplies both read addresses and write addresses to memory unit 114. The 

10 read address bus 3018 and the write address bus 3020 of Figure 30 are therefore multiplexed together (not 
shown) to supply the addresses on the OUT2 output lines of the DWT address generator. 

To perform the inverse transform on a three octave sub-band decomposition stored in memory unit 116 
of Figure 43, the row and column convolvers of the video encoder/decoder chip 112 require both low and high 
pass components to perform the inverse transform. When performing the octave 0 inverse transform on the 

15 U and V data values of the sub-band decomposition, zeros are inserted when the video encoder/decoder chip 
112 is to read high pass transformed data values. In the octave 0 inverse transform, the row convolver is by- 
passed such that the output of the column convolver is written directly to the appropriate locations in the mem- 
ory unit 114 for the U and V inverse transform data values. When the Y transform data values in memory unit 
116 are to be inverse transformed, on the other hand, both the column convolver and the row convolver of the 

20 video encoder/decoder chip 11 2 are used on each of the three octaves of the inverse transform. The resulting 
inverse transformed Y data values are written into memory unit 114 in the appropriate locations as indicated 
in Figure 43. 

Figure 44 illustrates a sequence of reading and writing Y data values from the Y portion of the new portion 
of memory unit 116 in accordance with the embodiment of the present invention illustrated in Figure 1 where 

25 memory un it 1 1 6 Is a static random access memory (SRAM). The dots In Figure 44 represent individual memory 
locations in a two-dimensional matrix of memory locations adequately wide and deep to store an entire sub- 
band decomposition of the Y values in a single two-dimensional matrix. The discrete wavelet transform chip 
1 22 reads the memory location Indicated R0 during a first time period, outputs a transformed data value during 
a second time period to the memory location indicated W1 , reads another data value from the memory location 

30 denoted R2, writes a transformed data value to the memory location denoted W3 and so forth. If memory unit 
116 is realized as a dynamic random access memory (DRAM), addressing memory unit 116 in this manner 
results in a different row of the memory unit being accessed each successive time period. When successive 
accesses are made to different rows of standard dynamic random access memory, a row address select (RAS) 
cycle must be performed each time the row address changes. On the other hand, if successive accesses are 

35 performed on memory locations that fall in the same row, then only column address select (CAS) cycles need 
to be performed. Performing a CAS cycle is significantly faster in a standard dynamic random access memory 
than a RAS cycle. Accordingly, when memory unit 114 is realized as a dynamic random access memory and 
when memory unit 11 6 is read and written in the fashion illustrated in Figure 44, memory accesses are slow. 
Figure 45 illustrates a sequence of reading and writing memory unit 116 in accordance with another em- 

40 bodiment of the present invention wherein memory unit 11 6 is realized as a dynamic random access memory. 
Again, the dots denote individual memory locations and the matrix of memory locations is assumed to be wide 
enough and deep enough to accommodate the Y portion of the sub-band decomposition in a single two-di- 
mensional matrix. In the first time period, the memory location designated R0 is read. In the next time period, 
the memory location R1 is read, then R2 is read in a subsequent time period, then R3 is read in a subsequent 

45 period, and so forth. In this way one row of low pass component HH values is read into the video encoder/de- 
coder chip 112 using only one RAS cycle and multiple CAS cycles. Then, a second row of low pass component 
HH data values is read as designated in Figure 45 by numerals R160, R161, R162 and so forth. The last low 
pass component data value to be read in the second row is designated R319. This row is also read into the 
video encoder/decoder chip 112 using only one RAS cycle and multiple CAS cycles. Figure 15 illustrates that 

so after reading the data values that the resulting octave 1 transformed data values determined by the discrete 
wavelet transform chip 122 are now present in the line delays designated 1334 and 1340 illustrated in Figure 
13. At this point in this embodiment of the present invention, the row convolver and the column convolver of 
the discrete wavelet transform chip 122 are stopped by freezing all the control signals except that line delays 
1334 and 1340 are read in sequential fashion and written to the Y portion of the new portion of memory unit 

55 116 as illustrated in Figure 45. In this fashion, two rows of memory locations which were previously read in 
time periods 0 through 319 are now overwritten with the resulting octave 1 transformed values in periods 320 
through 639. Only one RAS cycle is required to write the transformed data values in time periods 320 through 
479. Similarly, only one RAS cycle is required to write transformed data values during time periods 480 through 



25 



EP 0 622 741 A2 



639. This results in significantly faster accessing of memory unit 116. Because dynamic random access mem- 
ory can be used to realize memory unit 116 rather than static random access memory, system cost is reduced 
considerably. 

In accordance with this embodiment of the present invention, the output of the output OUT2 of the column 
5 convolver of the video encoder/decoder circuit 112 is coupled to the output leads of block 1332 as illustrated 
in Figure 1 3. However, in the forward or inverse transform of any other octave, the output leads OUT2 are cou- 
pled to the line delay 1340. Accordingly, in an embodiment in accordance with the memory accessing scheme 
illustrated in Figure 45, a multiplexer (not shown) is provided to couple either the output of line delay 1340 or 
the output of adder block 1332 to the output leads OUT2 of the column wavelet transform circuit 704 of Figure 
w 13. 

Figure 46 illustrates another embodiment in accordance with the present invention. Memory unit 116 con- 
tains a new portion and an old portion. Each of the new and old portions contains a sub-band decomposition. 
Due to the spatial locality of the wavelet sub-band decomposition, each two-by-two block of low pass compo- 
nent data values has a high pass component consisting of three trees of high frequency two-by-two blocks of 

15 data values. For example, in a three octave sub-band decomposition, each two-by-two block of low pass com- 
ponent data values and its associated three trees of high pass component data values forms a 16-by-16 area 
of memory which is illustrated in Figure 46. 

In order for memory unit 116 to be realized in dynamic random access memory (DRAM), the static random 
access memories (SRAMs) 4600, 4601, 4602 and 4603 which are used as line delays in the discrete wavelet 

20 transform circuit 1 22 are used as cache memory to hold one 1 6-by-1 6 block in the new portion of memory unit 
1 16 as well as one 16-by-16 block in the old portion of memory unit 116, This allows each 16-by-16 block of 
dynamic random access memory realizing the new and old portions of memory unit 116 to be accessed using 
at most sixteen RAS cycles. This allows the video encoder/decoder chip 112 to use dynamic random access 
memory for memory unit 116 rather than static random access memory, thereby reducing system cost. 

25 Figure 47 illustrates a time line of a sequence of operations performed by the circuit illustrated in Figure 

46. In a first time period, old 16-by-16 block 3 is read into SRAM 1 4601 . Because there is only one set of data 
pins on video encoder/decoder chip 112 for accessing memory unit 116, the 16?by.-16 block 0 of the new portion 
of memory unit 116 Is read into SRAM 0 4600 in the second time period. Bidirectional multiplexer 4604 is con- 
trolled by select inputs 4605 to couple the 16- by- 16 block of old data values now present in SRAM 1 4601 to 

30 the bidirectional input port old 4606 of the tree processor/ encoder/decoder circuit 124. Similarly, the 16-by- 
1 6 new data values present in SRAM 0 4600 are coupled to the input port new 4607 of the tree processor/en- 
coder/ decoder circuit 124. Accordingly, the tree processor/ encoder/decoder circuit 124 performs tree proc- 
essing and encoding in a third time period. During the same third time period, the inverse quantized old 16- 
by-16 block is rewritten into SRAM 1 4601 through multiplexer 4604. In a fourth time period, old 1 6- by- 16 block 

35 2 is read into SRAM 2 4602. Subsequently, in the fifth time period a 16- by- 16 block of new data values is read 
from memory unit 116 into SRAM 0 4600. The new and old 16-by-16 blocks are again provided to the tree 
processor/encoder/decoder for processing, the inverse quantized 16-by-16 old block being written into SRAM 
2 4602. During the period of time when the tree processor/encoder/decoder circuit 1 24 is performing tree proc- 
essing and encoding, the inverse quantized 16-by-16 block in SRAM 1 4601 is written back to 16-by-16 block 

40 3 of the old portion of memory unit 116. Subsequently, in the seventh time period. 16-by-16 block 5 of the old 
portion of memory unit 116 is read into SRAM 1 4601 and in the eighth time period the 16- by- 16 block of new 
data values 4 in memory unit 116 is read into SRAM 0 4600. In the ninth time period, tree processor/encod- 
er/decoder circuit 124 processes the 16-by-16 new and old blocks 4 and 5 while the 16-by-16 block of inverse 
quantized data values in SRAM 2 4602 is written to 16- by- 16 block 2 in the old portion of memory unit 116. 

4S This pipelining technique allows the dynamic random access memory (DRAM) to be accessed during each time 
period by taking advantage of the time period when the tree processor/encoder/decoder circuit 124 is proc- 
essing and not reading from memory unit 116. Because all accesses of memory unit 116 are directed to 16- 
by-16 blocks of memory locations, the number of CAS cycles is maximized. Arrows are provided in Figure 4 
6 between memory unit 116 and video encoder/decoder circuit 112 to illustrate the accessing of various 16- 

so by- 16 blocks of the new and old sub-band decompositions during different time periods. However, because 
video encoder/decoder chip 112 only has one set of data leads through which data values can be read from 
and written to memory unit 116, the input/output ports on the right sides of dual port static random access mem- 
ories 4600-4602 are bussed together and coupled to the input/output data pins of the video encoder/decoder 
chip 112. 

55 In order to avoid the necessity of providing an additional memory to realize first-in-first-out (FIFO) memory 

1 20, SRAM 3 4603, which is used as a line delay in the column convolver of the video encoder/decoder chip 
112, is coupled to the tree processor/encoder/decoder circuit 124 to buffer the compressed data stream for 
encoding and decoding operations between the ISA bus 106 and the video encoder/decoder chip 112. This 
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sharing of SRAM 3 is possible because the discrete wavelet transform circuit 122 operates in a first time period 
and the tree processor/encoder-decoder circuit 124 operates in a second time period. 

When the tree processor/encoder/decoder circuit 124 is performing the decoding function, the new portion 
of memory unit 11 6 is not required and SRAM 0 is unused. The read 0, read 1 , and read 4 time periods of the 
time line illustrated in Figure 47 are therefore omitted during decoding. 

Although the present invention has been described by way of the above described specific embodiments, 
the invention is not limited thereto. Adaptations, modifications, rearrangements and combinations of various 
features of the specific embodiments may be practiced without departing from the scope of the invention. For 
example, an integrated circuit chip may be realized which performs compression but not decompression and 
another integrated circuit chip may be realized which performs decompression but not compression. Any level 
of integration may be practiced including placing memory units on the same chip with a discrete wavelet trans- 
form circuit and a tree processor/encoder-decoder circuit. The invention may be incorporated Into consumer 
items including personal computers, video cassette recorders (VCRs), video cameras, televisions, compact 
disc (CD) players and/or recorders, and digital tape equipment The invention may process still image data, 
video data and/or audio data. Filters other than four coefficient quasi-Daubechies forward transform filters 
and corresponding four coefficient reconstruction (inverse transform) filters may be used including filters dis- 
closed in copending Patent Cooperation Treaty (PCT) application filed March 30, 1994, entitled "Data Com- 
pression and Decompression". Various start and end forward transform fitters and various corresponding start 
and end reconstruction (inverse transform) filters may also be used including filters disclosed in copending 
Patent Cooperation Treaty (PCT) application filed March 30, 1994, entitled "Data Compression and Decom- 
pression". Tokens may be encoded or unencoded. Other types of tokens for encoding other information includ- 
ing motion In consecutive video frames may be used. Other types of encoding other than Huffman encoding 
may be used and different quantization schemes may be employed. The above description of the preferred 
embodiments is therefore presented merely for illustrative instructional purposes and is not intended to limit 
the scope of the invention as set forth in the appended daims. 
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